git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Sivakumar Selvam <gerritcode@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: git repack command on larger pack file
Date: Tue, 27 Oct 2015 19:44:06 -0400	[thread overview]
Message-ID: <20151027234406.GB4172@sigill.intra.peff.net> (raw)
In-Reply-To: <loom.20151027T025257-333@post.gmane.org>

On Tue, Oct 27, 2015 at 02:04:23AM +0000, Sivakumar Selvam wrote:

>    When I finished git repacking, I found 12 pack files with each 4 GB and
> the total size is 48 GB. Again I ran the same git repack command by just
> removing only --max-pack-size= parameter, the size of the single pack file
> is 66 GB.
> 
> git repack -A -b -d -q --depth=50 --window=10 abc.git
> 
> Now, I see the total size of the single abc.git has become 66 GB. Initially
> it was 34 GB, After using  --max-pack-size=4g it become 48 GB. When we
> remove the --max-pack-size=4g parameter and tried to create a single pack
> file now it become 66 GB.
>    
> Looks like once we do git repack with multiple pack files, we can't revert
> back to the original size.

Git tries to take some shortcuts when repacking: if two objects are in
the same pack but not deltas, it will not consider making deltas out of
them. The logic is we would already have tried that while making the
original pack. But of course when you are doing weird things with the
packing parameters, that is not always a good assumption.

When doing experiments like this, add "-f" to your repack command-line
to avoid reusing deltas. The result should be much smaller (at the
expense of more CPU time to do the repack).

I'd also recommend increasing "--window" if you can afford the extra CPU
during the repack. It can often produce smaller packs. And it has less
cost than you might think (e.g., window=20 is not twice as expensive as
window=10, because the work to access the objects is cached).  You can
also increase --depth, but I have never found it to be particularly
helpful for decreasing size[1].

-Peff

[1] This is all theory, and I don't know how well git actually finds
    such deltas, but it is probably better to have a dense tree of
    deltas rather than long chains.  If you have a chain of N objects
    and would to add object N+1 to it, you are probably not much worse
    off to base it on object N-1, creating a "fork" at N. The resulting
    objects should be less expensive to access for subsequent operations
    (as any time you want the Nth object, you have to resolve all parts
    of the chain, so shorter chains are better, and you the delta cache
    is more likely to get a hit on that N-1 object).

  reply	other threads:[~2015-10-27 23:44 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-26  5:57 git repack command on larger pack file Sivakumar Selvam
2015-10-26  6:41 ` Junio C Hamano
2015-10-26  7:11   ` Junio C Hamano
2015-10-27  2:04     ` Sivakumar Selvam
2015-10-27 23:44       ` Jeff King [this message]
2015-10-28  6:23         ` Junio C Hamano
2015-10-28  6:47           ` Junio C Hamano
2015-10-27  8:52     ` Philip Oakley
2015-10-27 23:47   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151027234406.GB4172@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=gerritcode@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).