git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Shawn O. Pearce" <spearce@spearce.org>
To: Bill Lear <rael@zopyra.com>
Cc: Andreas Ericsson <ae@op5.se>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	git@vger.kernel.org
Subject: Re: When to repack?
Date: Wed, 31 Jan 2007 10:36:47 -0500	[thread overview]
Message-ID: <20070131153647.GA21888@spearce.org> (raw)
In-Reply-To: <Pine.LNX.4.63.0701311617360.22628@wbgn013.biozentrum.uni-wuerzburg.de>

Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> On Wed, 31 Jan 2007, Andreas Ericsson wrote:
> > Bill Lear wrote:
> > > We have a company repo used by many people throughout the day.  When/how
> > > can I repack this?  I have come to adopt this approach:
> 
> AFAIR this case is handled gracefully by git. If the object it is still 
> accessing moves to a(nother) pack, git will still find it.

No AFAIR, its definately true.  `git gc` is completely safe on a
live repository.  Run it at will.  Toss it in a cronjob.  Whatever.

What is *not* safe is `git gc --prune`.  Don't run that on an
active repository.
 
> > On a side-note, this is a grade A example of something that should 
> > typically be done sunday night at 4am.

Possibly.  Almost doesn't matter when you run it, except on very huge
repositories where the repack would take more than a few minutes.

Really, just toss something like the following in a cronjob that
runs once a week:

	#!/bin/sh
	for g in /path/to/gits/*.git
	do
	  git --git-dir="$g" gc
	done

If you want to get fancy, use the output of `git count-objects -v`:

	count: 325
	size: 2332
	in-pack: 40894
	packs: 1
	prune-packable: 0
	garbage: 0

I look for a count over 2000 or packs over 5.  If either is true,
I run gc, otherwise I skip that repository and leave it alone
that week.  And that's actually packing more frequently than I
really need to.  On any UNIX system you can probably let those go
to >5,000 or 20 and still not really see a performance problem.

In general, repacks and network transfers (or basically any
operation) takes longer as the number of loose objects increases
(that's the count field in `git count-objects -v`).  Keep below
~2000 and `git gc` times tend to be measured in just a minute or
two for even 200 MiB repositories.

-- 
Shawn.

  reply	other threads:[~2007-01-31 15:37 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-31 12:50 When to repack? Bill Lear
2007-01-31 12:58 ` Jakub Narebski
2007-01-31 13:01 ` Andreas Ericsson
2007-01-31 15:19   ` Johannes Schindelin
2007-01-31 15:36     ` Shawn O. Pearce [this message]
  -- strict thread matches above, loose matches on Subject: below --
2008-02-29 10:55 Paul Gardiner
2008-02-29 11:47 ` Karl Hasselström
2008-02-29 13:22 ` Jakub Narebski
2008-03-01  1:00 ` Xavier Maillard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070131153647.GA21888@spearce.org \
    --to=spearce@spearce.org \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=ae@op5.se \
    --cc=git@vger.kernel.org \
    --cc=rael@zopyra.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).