From: Nicolas Pitre <nico@cam.org>
To: Geert Bosch <bosch@adacore.com>
Cc: Jeff King <peff@peff.net>, Brandon Casey <casey@nrlssc.navy.mil>,
John Dlugosz <JDlugosz@TradeStation.com>,
git@vger.kernel.org
Subject: Re: dangling commits and blobs: is this normal?
Date: Thu, 23 Apr 2009 14:51:08 -0400 (EDT) [thread overview]
Message-ID: <alpine.LFD.2.00.0904231438280.6741@xanadu.home> (raw)
In-Reply-To: <064C9132-2E72-4665-A44D-A2F4194DAC2B@adacore.com>
On Thu, 23 Apr 2009, Geert Bosch wrote:
>
> On Apr 22, 2009, at 16:05, Jeff King wrote:
> > The other tradeoff, mentioned by Matthieu, is not about speed, but about
> > rollover of files on disk. I think he would be in favor of a less
> > optimal pack setup if it meant rewriting the largest packfile less
> > frequently.
> >
> > However, it may be reasonable to suggest that he just not manually "gc"
> > then. If he is not generating enough commits to warrant an auto-gc, then
> > he is probably not losing much by having loose objects. And if he is,
> > then auto-gc is already taking care of it.
>
> For large repositories with lots of large files, git spends too much
> time copying large packs for relatively little gain. This is obvious when
> you include a few dozen large objects in any repository.
> Currently, there is no limit to the number of times this data may
> be copied. In particular, the average amount of I/O needed for
> changes of size X depends linearly on the size of the total repository.
> So, the mere presence of a couple of large objects has an large distributed
> overhead.
You can put a limit on the number of times this data is copied, and even
set the limit to zero. Just add a .keep file to your .pack file and
that data will remain in stone. Any further repack will consider only
those newly added objects you may have.
> Wouldn't it be better to have a maximum of N packs, named
> pack_0 .. pack_(N - 1), in the repository with each pack_i being
> between 2^i and 2^(i+1)-1 bytes large? We could even dispense
> completely with loose objects and instead have each git operation
> create a single new pack.
I suggested that already for large enough objects. For small objects
this makes no sense as you may accumulate too many of them and each one
would need to be opened in order to find if it contains the desired
object whereas currently you need a simple directory lookup.
> number of packs in the way described is useful and will lead to significant
> speedups, especially during large imports that currently require frequent
> repacking of the entire repository.
Others commented on that issue already.
Nicolas
next prev parent reply other threads:[~2009-04-23 18:53 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-21 21:46 dangling commits and blobs: is this normal? John Dlugosz
2009-04-22 15:27 ` Jeff King
2009-04-22 16:53 ` Brandon Casey
2009-04-22 17:39 ` Nicolas Pitre
2009-04-22 18:15 ` Matthieu Moy
2009-04-22 19:08 ` Jeff King
2009-04-22 19:45 ` Brandon Casey
2009-04-22 19:58 ` Jeff King
2009-04-22 20:07 ` Nicolas Pitre
2009-04-23 11:51 ` Matthieu Moy
2009-04-22 19:14 ` Nicolas Pitre
2009-04-22 19:26 ` Brandon Casey
2009-04-22 20:00 ` Nicolas Pitre
2009-04-22 20:05 ` Jeff King
2009-04-22 20:11 ` Nicolas Pitre
2009-04-23 17:43 ` Geert Bosch
2009-04-23 17:56 ` Shawn O. Pearce
2009-04-23 18:10 ` Geert Bosch
2009-04-23 18:17 ` Matthias Andree
2009-04-23 18:51 ` Nicolas Pitre [this message]
2009-04-22 20:15 ` John Dlugosz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.00.0904231438280.6741@xanadu.home \
--to=nico@cam.org \
--cc=JDlugosz@TradeStation.com \
--cc=bosch@adacore.com \
--cc=casey@nrlssc.navy.mil \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).