git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@fluxnic.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jeff King <peff@peff.net>, Jay Soffian <jaysoffian@gmail.com>,
	git <git@vger.kernel.org>, Shawn Pearce <spearce@spearce.org>
Subject: Re: gc --aggressive
Date: Sat, 28 Apr 2012 12:42:26 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LFD.2.02.1204281151290.21030@xanadu.home> (raw)
In-Reply-To: <7vmx6am1h9.fsf@alter.siamese.dyndns.org>

[ coming late to this thread -- thanks to peff who pulled my attention ]

On Tue, 17 Apr 2012, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > On Tue, Apr 17, 2012 at 03:17:28PM -0700, Junio C Hamano wrote:
> >
> >> > How many cores are there on this box? Have you tried setting
> >> > pack.windowMemory to (12 / # of cores) or thereabouts?
> >> 
> >> Hrm, from the end-user's point of view, it appears that pack.windowMemory
> >> ought to mean the total without having to worry about the division of it
> >> across threads (which the implementation should be responsible for).
> >
> > Agreed. I had to look in the code to check which it meant. I'm not sure
> > we can change it without regressing existing users, though.
> 
> This is a tangent, but I noticed that the canned settings for "aggressive"
> use an arbitrarily hardcoded value of depth=250 and window=250 (tweakable
> with gc.aggressiveWindow).
> 
> Even though a shallower depth does cause base candidates with too long a
> chain hanging to be evicted prematurely while it is still in window and
> will lead to smaller memory consumption, I do not think the value of
> "depth" affects the pack-time memory consumption too much.  But the
> runtime performance of the resulting pack may not be great (in the worst
> case you would have to undelta 249 times to get to the object data).  We
> may want to loosen it a bit.

I think people are having misconceptions about the definition of the 
word "aggressive".

This option is, well, aggressive.  By definition this is not meant to be 
"nice".  This is not meant to be fast, or light on memory usage, etc.  
This means "achieve as much damage you can" to reduce the pack size.

If people are using it every night then they must be masochists, or 
attracted by violence, or getting a bit too casual with word 
definitions.

So if being --aggressive hurts, then don't do it.

If people want a loosened version, it would be more appropriate to 
introduce a --mild, or --bold, or --disruptive option.  In the same 
vain, an --insane option could even be introduced to go even further 
than --aggressive.

This being said, this is no excuse for regressions though.  If git is 
eating up much more memory than it used to, provided with the same 
repository and repacking parameters than before, then this certainly 
needs fixing.  But making --aggressive less so is not a fix.

> Also it might make sense to make the window size a bit more flexible
> depending on the nature of your history (you would get bigger benefit with
> larger window when your history has fine grained commits; if there are not
> many few-liner commits, larger window may not help you that much).

How do you detect the history nature of a repository?  That's the hard 
part.  Because it should be auto detected as most users won't make a good 
guess for the best parameter value to use.

Anyway, I think that the window size in terms of objects is a bad 
parameter.  Historically that is the first thing we implemented. But the 
window _memory_ usage is probably a better setting to use.  The delta 
search cost is directly proportional to the amount of data to process 
and that can be controlled with --window-memory, with the ability to 
scale up and down the number of objects in the window.  Keeping the 
number of objects constant makes memory usage totally random since this 
depends on the repository content, and the computing cost to process it 
is highly unpredictable. This is very counter-intuitive for users.

Right now the window is limited by default to 10 objects, and window 
memory usage is unlimited.  This could be reworked so object number, 
while still being limited to avoid pathological cases, could be much 
higher, and the window memory usage always limited by default.  That 
default memory usage could be scaled according to the available 
resources on the system.  But if the user wants to play with this, then 
using a memory usage parameter is much easier to understand with more 
directly observable system load influence.


Nicolas

  reply	other threads:[~2012-04-28 16:42 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-17 16:16 gc --aggressive Jay Soffian
2012-04-17 17:53 ` Jay Soffian
2012-04-17 20:52   ` Matthieu Moy
2012-04-17 21:58     ` Jeff King
2012-04-28 12:25     ` Jeff King
2012-04-28 17:11       ` Nicolas Pitre
2012-04-29 11:34         ` Jeff King
2012-04-29 13:53           ` Nicolas Pitre
2012-05-01 16:28             ` Jeff King
2012-05-01 17:16               ` Jeff King
2012-05-01 17:59                 ` Nicolas Pitre
2012-05-01 18:47                   ` Junio C Hamano
2012-05-01 19:22                     ` Nicolas Pitre
2012-05-01 20:01                     ` Jeff King
2012-05-01 19:35                   ` Jeff King
2012-05-01 20:02                     ` Nicolas Pitre
2012-05-01 17:17               ` Nicolas Pitre
2012-05-01 17:22                 ` Jeff King
2012-05-01 17:47                   ` Nicolas Pitre
2012-04-28 16:56   ` Nicolas Pitre
2012-04-17 22:08 ` Jeff King
2012-04-17 22:17   ` Junio C Hamano
2012-04-17 22:18     ` Jeff King
2012-04-17 22:34       ` Junio C Hamano
2012-04-28 16:42         ` Nicolas Pitre [this message]
2012-04-18  8:49       ` Andreas Ericsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.02.1204281151290.21030@xanadu.home \
    --to=nico@fluxnic.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jaysoffian@gmail.com \
    --cc=peff@peff.net \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).