From: Hin-Tak Leung <hintak.leung@gmail.com>
To: Nicolas Pitre <nico@cam.org>
Cc: git@vger.kernel.org
Subject: Re: git gc expanding packed data?
Date: Tue, 11 Aug 2009 11:17:14 +0100 [thread overview]
Message-ID: <3ace41890908110317k6e6ada07jc39ea446f9fa246e@mail.gmail.com> (raw)
In-Reply-To: <alpine.LFD.2.00.0908042203380.16073@xanadu.home>
On Wed, Aug 5, 2009 at 11:39 PM, Nicolas Pitre<nico@cam.org> wrote:
> On Tue, 4 Aug 2009, Hin-Tak Leung wrote:
>
>> I cloned gcc's git about a week ago to work on some problems I have
>> with gcc on minor platforms, just plain 'git clone
>> git://gcc.gnu.org/git/gcc.git gcc' .and ran gcc fetch about daily, and
>> 'git rebase origin' from time to time. I don't have local changes,
>> just following and monitoring what's going on in gcc. So after a week,
>> I thought I'd do a git gc . Then it goes very bizarre.
>>
>> Before I start 'git gc', .The whole of .git was about 700MB and
>> git/objects/pack was a bit under 600MB, with a few other directories
>> under .git/objects at 10's of K's and a few 30000-40000K's, and the
>> checkout was, well, the size of gcc source code. But after I started
>> git gc, the message stays in the 'counting objects' at about 900,000
>> for a long time, while a lot of directories under .git/objects/ gets a
>> bit large, and .git blows up to at least 7GB with a lot of small files
>> under .git/objects/*/, before seeing as I will run out of disk space,
>> I kill the whole lot and ran git clone again, since I don't have any
>> local change and there is nothing to lose.
>>
>> I am running git version 1.6.2.5 (fedora 11). Is there any reason why
>> 'git gc' does that?
>
> There is probably a reason, although a bad one for sure.
>
> Well... OK.
>
> It appears that the git installation serving clone requests for
> git://gcc.gnu.org/git/gcc.git generates lots of unreferenced objects. I
> just cloned it and the pack I was sent contains 1383356 objects (can be
> determined with 'git show-index < .git/objects/pack/*.idx | wc -l').
> However, there are only 978501 actually referenced objects in that
> cloned repository ( 'git rev-list --all --objects | wc -l'). That makes
> for 404855 useless objects in the cloned repository.
>
> Now git has a safety mechanism to _not_ delete unreferenced objects
> right away when running 'git gc'. By default unreferenced objects are
> kept around for a period of 2 weeks. This is to make it easy for you to
> recover accidentally deleted branches or commits, or to avoid a race
> where a just-created object in the process of being but not yet
> referenced could be deleted by a 'git gc' process running in parallel.
>
> So to give that grace period to packed but unreferenced objects, the
> repack process pushes those unreferenced objects out of the pack into
> their loose form so they can be aged and eventually pruned. Objects
> becoming unreferenced are usually not that many though. Having 404855
> unreferenced objects is quite a lot, and being sent those objects in the
> first place via a clone is stupid and a complete waste of network
> bandwidth.
>
> Anyone has an idea of the git version running on gcc.gnu.org? It is
> certainly buggy and needs fixing.
>
> Anyway... To solve your problem, you simply need to run 'git gc' with
> the --prune=now argument to disable that grace period and get rid of
> those unreferenced objects right away (safe only if no other git
> activities are taking place at the same time which should be easy to
> ensure on a workstation). The resulting .git/objects directory size
> will shrink to about 441 MB. If the gcc.gnu.org git server was doing
> its job properly, the size of the clone transfer would also be
> significantly smaller, meaning around 414 MB instead of the current 600+
> MB.
>
> And BTW, using 'git gc --aggressive' with a later git version (or
> 'git repack -a -f -d --window=250 --depth=250') gives me a .git/objects
> directory size of 310 MB, meaning that the actual repository with all
> the trunk history is _smaller_ than the actual source checkout. If that
> repository was properly repacked on the server, the clone data transfer
> would be 283 MB. This is less than half the current clone transfer
> size.
>
>
> Nicolas
>
'git gc --prune=now' does work, but 'git gc --prune=now --aggressive'
(before) and 'git gc --aggressive' (after) both create very large
(>2GB; I stopped it) packs from the ~400MB-600MB packed objects. I
noted that you specifically wrote 'with a later git version' -
presumably there is a some sort of a known and fixed issue there? Just
curious.
I guess --aggressive doesn't always save space...
Hin-Tak
next prev parent reply other threads:[~2009-08-11 12:46 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-04 20:25 git gc expanding packed data? Hin-Tak Leung
2009-08-05 22:39 ` Nicolas Pitre
2009-08-11 10:17 ` Hin-Tak Leung [this message]
2009-08-11 21:33 ` Nicolas Pitre
2009-08-12 14:45 ` Hin-Tak Leung
2009-08-12 15:35 ` Nicolas Pitre
2009-08-13 17:31 ` Hin-Tak Leung
-- strict thread matches above, loose matches on Subject: below --
2009-08-08 1:11 Andreas Schwab
2009-08-08 13:05 ` Hin-Tak Leung
2009-08-08 13:25 ` Andreas Schwab
2009-08-09 2:56 ` Nicolas Pitre
2009-08-09 7:43 ` Andreas Schwab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3ace41890908110317k6e6ada07jc39ea446f9fa246e@mail.gmail.com \
--to=hintak.leung@gmail.com \
--cc=git@vger.kernel.org \
--cc=nico@cam.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).