git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Eric Wong <e@80x24.org>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [RFC] gc: correct gc.autoPackLimit documentation
Date: Fri, 24 Jun 2016 22:06:20 -0400	[thread overview]
Message-ID: <20160625020620.GA31290@sigill.intra.peff.net> (raw)
In-Reply-To: <20160625011450.GA14293@dcvr.yhbt.net>

On Sat, Jun 25, 2016 at 01:14:50AM +0000, Eric Wong wrote:

> I'm not sure if this is the best approach, or if changing
> too_many_packs can be done without causing problems for
> hosts of big repos.
> 
> -------8<-----
> Subject: [PATCH] gc: correct gc.autoPackLimit documentation
> 
> I want to ensure there is only one pack in my repo to take
> advantage of pack bitmaps.  Based on my reading of the
> documentation, I configured gc.autoPackLimit=1 which led to
> "gc --auto" constantly trying to repack on every invocation.

I'm not sure if you might be misinterpreting earlier advice on bitmaps
here. At the time of packing, bitmaps need for all of the objects to go
to a single pack (they cannot handle a case where one object in the pack
can reach another object that is not in the pack). But that is easily
done with "git repack -adb".

After that packing, you can add new packs that do not have bitmaps, and
the bitmaps will gracefully degrade. E.g., imagine master was at tip X
when you repacked with bitmaps, and now somebody has pushed to make it
tip Y.  Somebody then clones, asking for Y. The bitmap code will start
at Y and walk backwards. When it hits X, it stops walking as it can fill
in the rest of the reachability from there.

So you do have to walk X..Y the old-fashioned way, but that's generally
not a big problem for a few pushes.

IOW, I think trying to repack on every single push is probably overkill.
Yes, it will buy you a little savings on fetch requests, but whether it
is worthwhile to pack depends on:

 - how big the push was (e.g., 2 commits versus thousands; the bigger
   it is, the more you save per fetch

 - how big the repo is (the bigger it is, the more it costs to do the
   repack; packing is linear-ish effort in the number of objects in the
   repo)

 - how often you get fetches versus pushes (your cost is amortized
   across all the fetches)

There are numbers where it can be worth it to pack really aggressively,
but I doubt it's common. At GitHub we use a combination of number of
packs (and we try to keep it under 50) and size of objects not in the
"main" pack (I did a bunch of fancy logging and analysis of object
counts, bytes in packs, etc, at one point, and we basically realized
that for the common cases, all of the interesting metrics are roughly
proportional to the number of bytes that could be moved into the main
pack).

That's neither here nor there for the off-by-one in gc or its
documentation, of course, but just FYI.

-Peff

  reply	other threads:[~2016-06-25  2:06 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-25  1:14 [RFC] gc: correct gc.autoPackLimit documentation Eric Wong
2016-06-25  2:06 ` Jeff King [this message]
2016-06-25  2:53   ` Eric Wong
2016-06-25  6:14     ` Junio C Hamano
2016-06-25  6:46       ` [PATCH] gc: fix off-by-one error with gc.autoPackLimit Eric Wong
2016-06-27 19:38         ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160625020620.GA31290@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).