git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: Geert Bosch <bosch@adacore.com>
Cc: Jeff King <peff@peff.net>, Brandon Casey <casey@nrlssc.navy.mil>,
	John Dlugosz <JDlugosz@TradeStation.com>,
	git@vger.kernel.org
Subject: Re: dangling commits and blobs: is this normal?
Date: Thu, 23 Apr 2009 14:51:08 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LFD.2.00.0904231438280.6741@xanadu.home> (raw)
In-Reply-To: <064C9132-2E72-4665-A44D-A2F4194DAC2B@adacore.com>

On Thu, 23 Apr 2009, Geert Bosch wrote:

> 
> On Apr 22, 2009, at 16:05, Jeff King wrote:
> > The other tradeoff, mentioned by Matthieu, is not about speed, but about
> > rollover of files on disk. I think he would be in favor of a less
> > optimal pack setup if it meant rewriting the largest packfile less
> > frequently.
> > 
> > However, it may be reasonable to suggest that he just not manually "gc"
> > then. If he is not generating enough commits to warrant an auto-gc, then
> > he is probably not losing much by having loose objects. And if he is,
> > then auto-gc is already taking care of it.
> 
> For large repositories with lots of large files, git spends too much
> time copying large packs for relatively little gain. This is obvious when
> you include a few dozen large objects in any repository.
> Currently, there is no limit to the number of times this data may
> be copied. In particular, the average amount of I/O needed for
> changes of size X depends linearly on the size of the total repository.
> So, the mere presence of a couple of large objects has an large distributed
> overhead.

You can put a limit on the number of times this data is copied, and even 
set the limit to zero.  Just add a .keep file to your .pack file and 
that data will remain in stone.  Any further repack will consider only 
those newly added objects you may have.

> Wouldn't it be better to have a maximum of N packs, named
> pack_0 .. pack_(N - 1),  in the repository with each pack_i being
> between 2^i and 2^(i+1)-1 bytes large? We could even dispense
> completely with loose objects and instead have each git operation
> create a single new pack.

I suggested that already for large enough objects.  For small objects 
this makes no sense as you may accumulate too many of them and each one 
would need to be opened in order to find if it contains the desired 
object whereas currently you need a simple directory lookup.

> number of packs in the way described is useful and will lead to significant
> speedups, especially during large imports that currently require frequent
> repacking of the entire repository.

Others commented on that issue already.


Nicolas

  parent reply	other threads:[~2009-04-23 18:53 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-21 21:46 dangling commits and blobs: is this normal? John Dlugosz
2009-04-22 15:27 ` Jeff King
2009-04-22 16:53   ` Brandon Casey
2009-04-22 17:39     ` Nicolas Pitre
2009-04-22 18:15       ` Matthieu Moy
2009-04-22 19:08         ` Jeff King
2009-04-22 19:45           ` Brandon Casey
2009-04-22 19:58             ` Jeff King
2009-04-22 20:07             ` Nicolas Pitre
2009-04-23 11:51           ` Matthieu Moy
2009-04-22 19:14         ` Nicolas Pitre
2009-04-22 19:26       ` Brandon Casey
2009-04-22 20:00         ` Nicolas Pitre
2009-04-22 20:05           ` Jeff King
2009-04-22 20:11             ` Nicolas Pitre
2009-04-23 17:43             ` Geert Bosch
2009-04-23 17:56               ` Shawn O. Pearce
2009-04-23 18:10                 ` Geert Bosch
2009-04-23 18:17                   ` Matthias Andree
2009-04-23 18:51               ` Nicolas Pitre [this message]
2009-04-22 20:15   ` John Dlugosz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.0904231438280.6741@xanadu.home \
    --to=nico@cam.org \
    --cc=JDlugosz@TradeStation.com \
    --cc=bosch@adacore.com \
    --cc=casey@nrlssc.navy.mil \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).