From: Nicolas Pitre <nico@fluxnic.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jeff King <peff@peff.net>, Jay Soffian <jaysoffian@gmail.com>,
git <git@vger.kernel.org>, Shawn Pearce <spearce@spearce.org>
Subject: Re: gc --aggressive
Date: Sat, 28 Apr 2012 12:42:26 -0400 (EDT) [thread overview]
Message-ID: <alpine.LFD.2.02.1204281151290.21030@xanadu.home> (raw)
In-Reply-To: <7vmx6am1h9.fsf@alter.siamese.dyndns.org>
[ coming late to this thread -- thanks to peff who pulled my attention ]
On Tue, 17 Apr 2012, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > On Tue, Apr 17, 2012 at 03:17:28PM -0700, Junio C Hamano wrote:
> >
> >> > How many cores are there on this box? Have you tried setting
> >> > pack.windowMemory to (12 / # of cores) or thereabouts?
> >>
> >> Hrm, from the end-user's point of view, it appears that pack.windowMemory
> >> ought to mean the total without having to worry about the division of it
> >> across threads (which the implementation should be responsible for).
> >
> > Agreed. I had to look in the code to check which it meant. I'm not sure
> > we can change it without regressing existing users, though.
>
> This is a tangent, but I noticed that the canned settings for "aggressive"
> use an arbitrarily hardcoded value of depth=250 and window=250 (tweakable
> with gc.aggressiveWindow).
>
> Even though a shallower depth does cause base candidates with too long a
> chain hanging to be evicted prematurely while it is still in window and
> will lead to smaller memory consumption, I do not think the value of
> "depth" affects the pack-time memory consumption too much. But the
> runtime performance of the resulting pack may not be great (in the worst
> case you would have to undelta 249 times to get to the object data). We
> may want to loosen it a bit.
I think people are having misconceptions about the definition of the
word "aggressive".
This option is, well, aggressive. By definition this is not meant to be
"nice". This is not meant to be fast, or light on memory usage, etc.
This means "achieve as much damage you can" to reduce the pack size.
If people are using it every night then they must be masochists, or
attracted by violence, or getting a bit too casual with word
definitions.
So if being --aggressive hurts, then don't do it.
If people want a loosened version, it would be more appropriate to
introduce a --mild, or --bold, or --disruptive option. In the same
vain, an --insane option could even be introduced to go even further
than --aggressive.
This being said, this is no excuse for regressions though. If git is
eating up much more memory than it used to, provided with the same
repository and repacking parameters than before, then this certainly
needs fixing. But making --aggressive less so is not a fix.
> Also it might make sense to make the window size a bit more flexible
> depending on the nature of your history (you would get bigger benefit with
> larger window when your history has fine grained commits; if there are not
> many few-liner commits, larger window may not help you that much).
How do you detect the history nature of a repository? That's the hard
part. Because it should be auto detected as most users won't make a good
guess for the best parameter value to use.
Anyway, I think that the window size in terms of objects is a bad
parameter. Historically that is the first thing we implemented. But the
window _memory_ usage is probably a better setting to use. The delta
search cost is directly proportional to the amount of data to process
and that can be controlled with --window-memory, with the ability to
scale up and down the number of objects in the window. Keeping the
number of objects constant makes memory usage totally random since this
depends on the repository content, and the computing cost to process it
is highly unpredictable. This is very counter-intuitive for users.
Right now the window is limited by default to 10 objects, and window
memory usage is unlimited. This could be reworked so object number,
while still being limited to avoid pathological cases, could be much
higher, and the window memory usage always limited by default. That
default memory usage could be scaled according to the available
resources on the system. But if the user wants to play with this, then
using a memory usage parameter is much easier to understand with more
directly observable system load influence.
Nicolas
next prev parent reply other threads:[~2012-04-28 16:42 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-17 16:16 gc --aggressive Jay Soffian
2012-04-17 17:53 ` Jay Soffian
2012-04-17 20:52 ` Matthieu Moy
2012-04-17 21:58 ` Jeff King
2012-04-28 12:25 ` Jeff King
2012-04-28 17:11 ` Nicolas Pitre
2012-04-29 11:34 ` Jeff King
2012-04-29 13:53 ` Nicolas Pitre
2012-05-01 16:28 ` Jeff King
2012-05-01 17:16 ` Jeff King
2012-05-01 17:59 ` Nicolas Pitre
2012-05-01 18:47 ` Junio C Hamano
2012-05-01 19:22 ` Nicolas Pitre
2012-05-01 20:01 ` Jeff King
2012-05-01 19:35 ` Jeff King
2012-05-01 20:02 ` Nicolas Pitre
2012-05-01 17:17 ` Nicolas Pitre
2012-05-01 17:22 ` Jeff King
2012-05-01 17:47 ` Nicolas Pitre
2012-04-28 16:56 ` Nicolas Pitre
2012-04-17 22:08 ` Jeff King
2012-04-17 22:17 ` Junio C Hamano
2012-04-17 22:18 ` Jeff King
2012-04-17 22:34 ` Junio C Hamano
2012-04-28 16:42 ` Nicolas Pitre [this message]
2012-04-18 8:49 ` Andreas Ericsson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.02.1204281151290.21030@xanadu.home \
--to=nico@fluxnic.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jaysoffian@gmail.com \
--cc=peff@peff.net \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).