From: Kjetil Barvik <barvik@broadpark.no>
To: Nicolas Pitre <nico@cam.org>
Cc: git@vger.kernel.org
Subject: Re: git repack: --depth=100000 causing larger not smaler pack file?
Date: Mon, 23 Mar 2009 11:11:07 +0100 [thread overview]
Message-ID: <86y6uwzgzo.fsf@broadpark.no> (raw)
In-Reply-To: <alpine.LFD.2.00.0903171608080.30483@xanadu.home>
Nicolas Pitre <nico@cam.org> writes:
> On Tue, 17 Mar 2009, Kjetil Barvik wrote:
>
>> aloha!
>>
>> Yesterday I run the following command on the updated GIT respository:
>>
>> git repack -adf --window=250000 --depth=100000
>>
>> After 280 minutes or so it finished, but the strange thing was that
>> the resulting pack-file was larger than before. I had expected that
>> it should be smaler, or at least the same size as before.
[snip]
>> I can think of one thing which is spesial with the "--depth=100000"
>> number, and that is that it is now larger than the total number of
>> objects in the pack, which is around 96000 to 97000, or so.
>
> No, the depth should have zero negative influence on the pack size.
> For tight compression, the larger the better. What this will impact
> though is runtime access to the pack data afterward. The deeper a
> given object is, the slower its access will be. But since the object
> recency order tend to put newer objects at the top of a delta chain,
> this should impact older objects more than recent ones.
I have done some more tests, and have copied the whole git/ directory
to a new directory (such that I do not accidentally add or delete any
objects/commits), and have made the following table:
All pack file sizes, F, below was computed with the following git
command:
git repack -adf --window=250000 --depth=D
D | F | (F - F_prev) / (D - D_prev)
-------|------------|----------------------------
5000 | 19129934 |
10000 | 19128956 | -978 / 5000 = -0.1956
15000 | 19126077 | -2879 / 5000 = -0.5758
20000 | 19126077 | 0 / 5000 = 0
25000 | 19126077 | 0 / 5000 = 0
30000 | 19197575 | 71498 / 5000 = 14.2996
45000 | 19312240 | 114665 / 15000 = 7.6443
60000 | 19560083 | 247843 / 15000 = 16.5229
75000 | 19803043 | 242960 / 15000 = 16.1973
90000 | 19669923 | -133120 / 15000 = -8.8746
95000 | 20463780 | 793857 / 5000 = 155.7714
From the table it seems that you get the smallest pack file (for this
particular repository) when --depth value is somewhere between 15000
and 25000. And, when the --depth value was 95000 the resulting pack
file was (- 20463780 19126077) = 1 337 703 bytes, 1.25 MiB, or 7%
larger than this.
> I doubt there is anything to debug. In this case the window size is
> used to evaluate a threshold slope for matching objects in the delta
> search. What we want is a broader delta tree more than a deep one in
> order to have more deltas with a lower depth limit. Therefore a size
> threshold is applied, based on the object distance in the delta search
> window (see commit c83f032e and the other ones referenced therein).
>
> By providing a big window value, the threshold slope becomes rather flat
> and ineffective, and this changes the delta match outcome. While delta
> selection is based on the uncompressed delta result, the compressed size
> of different deltas with the same size may vary. I suspect you might
> have been unlucky in that regard and this could explain the negative
> effect on the pack size.
From the table above it seems that I have been unlucky with _all_
--depth values above 25000 or so.
Question: is there some low level GIT command I can run to compare 2
pack files to maybe be able to see the reason behind the above table?
Maybe to see some details about how many delta's, how big each are,
total sizes, etc..
-- kjetil
PS! I have the following in my $HOME/.gitconfig file:
[repack]
UseDeltaBaseOffset = true
[gc]
auto = 25
autopacklimit = 1
next prev parent reply other threads:[~2009-03-23 10:14 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-17 19:05 git repack: --depth=100000 causing larger not smaler pack file? Kjetil Barvik
2009-03-17 20:38 ` Nicolas Pitre
2009-03-23 10:11 ` Kjetil Barvik [this message]
2009-03-23 10:20 ` Mike Ralphson
2009-03-23 14:05 ` Peter Harris
2009-03-23 14:14 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86y6uwzgzo.fsf@broadpark.no \
--to=barvik@broadpark.no \
--cc=git@vger.kernel.org \
--cc=nico@cam.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox