From: Nicolas Pitre <nico@cam.org>
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: Mike Hommey <mh@glandium.org>, Teemu Likonen <tlikonen@iki.fi>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>,
git@vger.kernel.org
Subject: Re: Why repository grows after "git gc"? / Purpose of *.keep files?
Date: Tue, 13 May 2008 21:03:31 -0400 (EDT) [thread overview]
Message-ID: <alpine.LFD.1.10.0805132005550.23581@xanadu.home> (raw)
In-Reply-To: <20080513001252.GB29038@spearce.org>
On Mon, 12 May 2008, Shawn O. Pearce wrote:
> Mike Hommey <mh@glandium.org> wrote:
> > On Mon, May 12, 2008 at 11:03:04PM +0200, Mike Hommey wrote:
> > > On Mon, May 12, 2008 at 11:24:14PM +0300, Teemu Likonen wrote:
> > > > But I have experienced the same earlier with some other post-1.5.5
> > > > version so I believe you can reproduce this yourself. After cloning
> > > > Linus's linux-2.6 repo its .git directory weights 209MB. After single
> > > > "git pull" and "git gc" it was 298MB in my test.
> > >
> > > I noticed that a while ago: when repacking multiple packs when one has a
> > > .keep file, the resulting additional pack contains too many blobs and
> > > trees, contrary to when only packing loose objects:
> > (...)
> >
> > That is, it seems to also contain all the blobs and subtrees for all the
> > commits the pack contains, even when they already are in the pack having
> > a .keep file.
>
> I've noticed this too. Like since day 1 when we added .keep.
> But uh, nobody else complained and I forgot about it.
Well, now that I've reproduced Teemu Likonen's test case, I can confirm
this is actually a problem. Here I get:
|remote: Counting objects: 523, done.
|remote: Compressing objects: 100% (57/57), done.
|remote: Total 362 (delta 305), reused 362 (delta 305)
|Receiving objects: 100% (362/362), 65.37 KiB, done.
|Resolving deltas: 100% (305/305), completed with 105 local objects.
|From ../test1
| 492c2e4..9404ef0 master -> master
The received pack is 449135 bytes large. This is much larger than the
actually received data which is 65.37 KiB, but we're completing a thin
pack with 105 undeltified objects accounting for the size increase which
is expected. So far so good.
Now, in theory, running 'git gc' should only repack those 362 + 105
objects, since the remaining ones are all found in the .keep flagged
pack. But that's not what's happening at all:
|Counting objects: 26559, done.
|Compressing objects: 100% (24708/24708), done.
|Writing objects: 100% (26559/26559), done.
|Total 26559 (delta 3054), reused 14011 (delta 1613)
So... there is something definitively wrong here. The expectation was
to get a pack in the same size range as the one received during the
pack, or somewhat smaller due to a better delta compression of the added
objects. But instead we get a pack containing 26559 objects!!! And in
that lot, only 3054 (11%) are deltas. That makes for a pack that
started from 449135 bytes and grew to 72395940 bytes.
> My theory (totally unproven) is that the new pack has objects we
> copied from the .keep pack, because those objects were the best
> delta-bases for the loose objects we have deltafied and want to
> store in the new pack. Except they aren't yet packed in the new
> pack, so we pack them too. Tada, duplicates. :-\
Well, not exactly.
Let's see what happens here even before any packing is attempted
|$ git rev-list --objects 492c2e4..9404ef0
|362
|
|$ git rev-list --objects --all \
| --unpacked=pack-6a3438b2702be06697023d80b77e67a73a0b0b5c.pack |
| wc -l
|26559
So this --unpacked= argument (which undocumented semantics I still have
issues with) is certainly not doing what is expected.
Nicolas
next prev parent reply other threads:[~2008-05-14 1:04 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-12 12:29 Why repository grows after "git gc"? / Purpose of *.keep files? Teemu Likonen
2008-05-12 15:52 ` Teemu Likonen
2008-05-12 17:13 ` Johannes Schindelin
2008-05-12 18:43 ` Teemu Likonen
2008-05-12 18:56 ` Nicolas Pitre
2008-05-12 19:09 ` Teemu Likonen
2008-05-12 19:36 ` Nicolas Pitre
2008-05-12 20:10 ` Govind Salinas
2008-05-12 21:06 ` Nicolas Pitre
2008-05-12 21:07 ` Govind Salinas
2008-05-12 20:24 ` Teemu Likonen
2008-05-12 21:03 ` Mike Hommey
2008-05-12 21:08 ` Mike Hommey
2008-05-13 0:12 ` Shawn O. Pearce
2008-05-13 5:33 ` Mike Hommey
2008-05-14 1:03 ` Nicolas Pitre [this message]
2008-05-14 6:43 ` Junio C Hamano
2008-05-14 9:10 ` Juergen Ruehle
2008-05-14 14:24 ` Nicolas Pitre
2008-05-14 17:03 ` Junio C Hamano
2008-05-14 20:06 ` Linus Torvalds
2008-05-14 20:19 ` Linus Torvalds
2008-05-14 20:29 ` Nicolas Pitre
2008-05-14 20:36 ` Linus Torvalds
2008-05-14 23:24 ` A Large Angry SCM
2008-05-12 21:07 ` Nicolas Pitre
2008-05-12 17:17 ` David Tweed
2008-05-12 23:49 ` Shawn O. Pearce
2008-05-12 23:53 ` Junio C Hamano
2008-05-13 0:09 ` Shawn O. Pearce
2008-05-13 5:08 ` Paolo Bonzini
2008-05-13 5:22 ` Shawn O. Pearce
2008-05-13 9:22 ` Teemu Likonen
2008-05-13 21:46 ` Stephen R. van den Berg
2008-05-14 5:42 ` Teemu Likonen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.1.10.0805132005550.23581@xanadu.home \
--to=nico@cam.org \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=mh@glandium.org \
--cc=spearce@spearce.org \
--cc=tlikonen@iki.fi \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).