git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Nicolas Pitre <nico@cam.org>
Cc: "Robin H. Johnson" <robbat2@gentoo.org>, git@vger.kernel.org
Subject: Re: Weird growth in packfile during initial push
Date: Wed, 29 Apr 2009 16:57:37 -0700	[thread overview]
Message-ID: <7vy6tj109a.fsf@gitster.siamese.dyndns.org> (raw)
In-Reply-To: alpine.LFD.2.00.0904151443030.6741@xanadu.home

Nicolas Pitre <nico@cam.org> writes:

>> $ git push origin master:master
>> Initialized empty Git repository in /var/gitroot/exp/gentoo-x86.git/
>> Counting objects: 4969800, done.
>> Delta compression using up to 8 threads.
>> Compressing objects: 100% (1217809/1217809), done.
>> Writing objects: 100% (4969800/4969800), 810.56 MiB | 21608 KiB/s, done.
>> Total 4969800 (delta 3735812), reused 4969800 (delta 3735812)
>
> Here we know for sure that all objects were directly reused, so no 
> attempt at recompressing them was done.  The only thing that 
> pack-objects might do in this case in addition to directly streaming the 
> existing pack is to convert delta object headers from OFS_DELTA to 
> REF_DELTA.
>
>> $ ls -la /var/gitroot/exp/gentoo-x86.git/objects/pack
>> total 966876
>> drwxr-xr-x 2 git git      4096 Apr 14 08:43 .
>> drwxr-xr-x 4 git git      4096 Apr 14 08:35 ..
>> -r--r--r-- 1 git git 139155472 Apr 14 08:43 pack-f805bb448f864becfeac9c7f8a8ac2ef90c26787.idx
>> -r--r--r-- 1 git git 849936308 Apr 14 08:43 pack-f805bb448f864becfeac9c7f8a8ac2ef90c26787.pack
>
> Let's see if my theory stands:
>
> 	849936308 - 786336481 = 63599827
> 	63599827 / 3735812 = 17.02
>
> Hence an average difference of 17 bytes per delta.  Given that REF_DELTA 
> objects have a 20-byte SHA1 base reference which is replaced with a 
> variable length encoding of a pack offset in the OFS_DELTA case, we're 
> talking about 2.98 bytes for that offset encoding which feels about 
> right.
>
> [...]
>
> And the code matches this theory as well.  Can you try this patch if you 
> have a chance?

Is there any progress on this?

I think you did a veryclear analysis.  8% size reduction is not only
unignorable but use of delta offset should also help runtime efficiency,
right?

  reply	other threads:[~2009-04-29 23:57 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-15 18:27 Weird growth in packfile during initial push Robin H. Johnson
2009-04-15 19:51 ` Nicolas Pitre
2009-04-29 23:57   ` Junio C Hamano [this message]
2009-04-30  2:52     ` Nicolas Pitre
2009-05-01  6:17     ` Robin H. Johnson
2009-05-01 20:56     ` [PATCH] allow OFS_DELTA objects during a push Nicolas Pitre
2009-05-01 23:49       ` Junio C Hamano
2009-05-02  0:01         ` Compatibility between git.git and jgit Shawn O. Pearce
2009-05-02  1:14           ` A Large Angry SCM
2009-05-02  1:39           ` Nicolas Pitre
2009-05-02  1:59             ` Shawn O. Pearce
2009-05-02 16:56             ` Ealdwulf Wuffinga
2009-05-02  1:40           ` Michael Witten
2009-05-02  0:24         ` [PATCH] allow OFS_DELTA objects during a push Nicolas Pitre
2009-05-04 22:11       ` Shawn O. Pearce
2009-05-04 22:30         ` Shawn O. Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vy6tj109a.fsf@gitster.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=nico@cam.org \
    --cc=robbat2@gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).