Git development
 help / color / mirror / Atom feed
From: Kjetil Barvik <barvik@broadpark.no>
To: Nicolas Pitre <nico@cam.org>
Cc: git@vger.kernel.org
Subject: Re: git repack: --depth=100000 causing larger not smaler pack file?
Date: Mon, 23 Mar 2009 11:11:07 +0100	[thread overview]
Message-ID: <86y6uwzgzo.fsf@broadpark.no> (raw)
In-Reply-To: <alpine.LFD.2.00.0903171608080.30483@xanadu.home>

Nicolas Pitre <nico@cam.org> writes:

> On Tue, 17 Mar 2009, Kjetil Barvik wrote:
>
>>   aloha!
>> 
>>   Yesterday I run the following command on the updated GIT respository:
>> 
>>     git repack -adf --window=250000 --depth=100000
>> 
>>   After 280 minutes or so it finished, but the strange thing was that
>>   the resulting pack-file was larger than before.  I had expected that
>>   it should be smaler, or at least the same size as before.
  [snip]
>>   I can think of one thing which is spesial with the "--depth=100000"
>>   number, and that is that it is now larger than the total number of
>>   objects in the pack, which is around 96000 to 97000, or so.
>
> No, the depth should have zero negative influence on the pack size.  
> For tight compression, the larger the better.  What this will impact 
> though is runtime access to the pack data afterward.  The deeper a 
> given object is, the slower its access will be.  But since the object 
> recency order tend to put newer objects at the top of a delta chain, 
> this should impact older objects more than recent ones.

  I have done some more tests, and have copied the whole git/ directory
  to a new directory (such that I do not accidentally add or delete any
  objects/commits), and have made the following table:

  All pack file sizes, F, below was computed with the following git
  command:

      git repack -adf --window=250000 --depth=D

     D   |     F      | (F - F_prev) / (D - D_prev)
  -------|------------|----------------------------
    5000 |  19129934  |
   10000 |  19128956  |    -978 /  5000 =  -0.1956
   15000 |  19126077  |   -2879 /  5000 =  -0.5758
   20000 |  19126077  |       0 /  5000 =   0
   25000 |  19126077  |       0 /  5000 =   0
   30000 |  19197575  |   71498 /  5000 =  14.2996
   45000 |  19312240  |  114665 / 15000 =   7.6443
   60000 |  19560083  |  247843 / 15000 =  16.5229
   75000 |  19803043  |  242960 / 15000 =  16.1973
   90000 |  19669923  | -133120 / 15000 =  -8.8746
   95000 |  20463780  |  793857 /  5000 = 155.7714

  From the table it seems that you get the smallest pack file (for this
  particular repository) when --depth value is somewhere between 15000
  and 25000.  And, when the --depth value was 95000 the resulting pack
  file was (- 20463780 19126077) = 1 337 703 bytes, 1.25 MiB, or 7%
  larger than this.

> I doubt there is anything to debug.  In this case the window size is 
> used to evaluate a threshold slope for matching objects in the delta 
> search.  What we want is a broader delta tree more than a deep one in 
> order to have more deltas with a lower depth limit.  Therefore a size 
> threshold is applied, based on the object distance in the delta search 
> window (see commit c83f032e and the other ones referenced therein).
>
> By providing a big window value, the threshold slope becomes rather flat 
> and ineffective, and this changes the delta match outcome.  While delta 
> selection is based on the uncompressed delta result, the compressed size 
> of different deltas with the same size may vary.  I suspect you might 
> have been unlucky in that regard and this could explain the negative 
> effect on the pack size.

  From the table above it seems that I have been unlucky with _all_
  --depth values above 25000 or so.

  Question: is there some low level GIT command I can run to compare 2
  pack files to maybe be able to see the reason behind the above table?
  Maybe to see some details about how many delta's, how big each are,
  total sizes, etc..

  -- kjetil

  PS!  I have the following in my $HOME/.gitconfig file:

[repack]
	UseDeltaBaseOffset = true
[gc]
	auto = 25
	autopacklimit = 1

  reply	other threads:[~2009-03-23 10:14 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-17 19:05 git repack: --depth=100000 causing larger not smaler pack file? Kjetil Barvik
2009-03-17 20:38 ` Nicolas Pitre
2009-03-23 10:11   ` Kjetil Barvik [this message]
2009-03-23 10:20     ` Mike Ralphson
2009-03-23 14:05     ` Peter Harris
2009-03-23 14:14     ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86y6uwzgzo.fsf@broadpark.no \
    --to=barvik@broadpark.no \
    --cc=git@vger.kernel.org \
    --cc=nico@cam.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox