git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] gc --aggressive: make it really aggressive
  2007-12-06  6:09         ` Linus Torvalds
@ 2007-12-06 12:03           ` Johannes Schindelin
  2007-12-06 13:42             ` Theodore Tso
                               ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Johannes Schindelin @ 2007-12-06 12:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Daniel Berlin, David Miller, ismail, gcc, git, gitster


The default was not to change the window or depth at all.  As suggested
by Jon Smirl, Linus Torvalds and others, default to

	--window=250 --depth=250

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---

	On Wed, 5 Dec 2007, Linus Torvalds wrote:

	> On Thu, 6 Dec 2007, Daniel Berlin wrote:
	> > 
	> > Actually, it turns out that git-gc --aggressive does this dumb 
	> > thing to pack files sometimes regardless of whether you 
	> > converted from an SVN repo or not.
	> 
	> Absolutely. git --aggressive is mostly dumb. It's really only 
	> useful for the case of "I know I have a *really* bad pack, and I 
	> want to throw away all the bad packing decisions I have done".
	>
	> [...]
	> 
	> So the equivalent of "git gc --aggressive" - but done *properly* 
	> - is to do (overnight) something like
	> 
	> 	git repack -a -d --depth=250 --window=250

	How about this, then?
	
 builtin-gc.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/builtin-gc.c b/builtin-gc.c
index 799c263..c6806d3 100644
--- a/builtin-gc.c
+++ b/builtin-gc.c
@@ -23,7 +23,7 @@ static const char * const builtin_gc_usage[] = {
 };
 
 static int pack_refs = 1;
-static int aggressive_window = -1;
+static int aggressive_window = 250;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 20;
 
@@ -192,6 +192,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 
 	if (aggressive) {
 		append_option(argv_repack, "-f", MAX_ADD);
+		append_option(argv_repack, "--depth=250", MAX_ADD);
 		if (aggressive_window > 0) {
 			sprintf(buf, "--window=%d", aggressive_window);
 			append_option(argv_repack, buf, MAX_ADD);
-- 
1.5.3.7.2157.g9598e

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] gc --aggressive: make it really aggressive
  2007-12-06 12:03           ` [PATCH] gc --aggressive: make it really aggressive Johannes Schindelin
@ 2007-12-06 13:42             ` Theodore Tso
  2007-12-06 14:15               ` Nicolas Pitre
  2007-12-06 14:22             ` Pierre Habouzit
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Theodore Tso @ 2007-12-06 13:42 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Linus Torvalds, Daniel Berlin, David Miller, ismail, gcc, git,
	gitster

On Thu, Dec 06, 2007 at 12:03:38PM +0000, Johannes Schindelin wrote:
> 
> The default was not to change the window or depth at all.  As suggested
> by Jon Smirl, Linus Torvalds and others, default to
> 
> 	--window=250 --depth=250

I'd also suggest adding a comment in the man pages that this should
only be done rarely, and that it can potentially take a *long* time
(i.e., overnight) for big repositories, and in general it's not worth
the effort to use --aggressive.

Apologies to Linus and to the gcc folks, since I was the one who
originally coded up gc --aggressive, and at the time my intent was
"rarely does it make sense, and it may take a long time".  The reason
why I didn't make the default --window and --depth larger is because
at the time the biggest repo I had easy access to was the Linux
kernel's, and there you rapidly hit diminishing returns at much
smaller numbers, so there was no real point in using --window=250
--depth=250.

Linus later pointed out that what we *really* should do is at some
point was to change repack -f to potentially retry to find a better
delta, but to reuse the existing delta if it was no worse.  That
automatically does the right thing in the case where you had
previously done a repack with --window=<large n> --depth=<large n>,
but then later try using "gc --agressive", which ends up doing a worse
job and throwing away the information from the previous repack with
large window and depth sizes.  Unfortunately no one ever got around to
implementing that.

Regards,

						- Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gc --aggressive: make it really aggressive
  2007-12-06 13:42             ` Theodore Tso
@ 2007-12-06 14:15               ` Nicolas Pitre
  0 siblings, 0 replies; 13+ messages in thread
From: Nicolas Pitre @ 2007-12-06 14:15 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Johannes Schindelin, Linus Torvalds, Daniel Berlin, David Miller,
	ismail, gcc, git, gitster

On Thu, 6 Dec 2007, Theodore Tso wrote:

> Linus later pointed out that what we *really* should do is at some
> point was to change repack -f to potentially retry to find a better
> delta, but to reuse the existing delta if it was no worse.  That
> automatically does the right thing in the case where you had
> previously done a repack with --window=<large n> --depth=<large n>,
> but then later try using "gc --agressive", which ends up doing a worse
> job and throwing away the information from the previous repack with
> large window and depth sizes.  Unfortunately no one ever got around to
> implementing that.

I did start looking at it, but there are subtle issues to consider, such 
as making sure not to create delta loops.  Currently this is avoided by 
never involving already reused deltas in new delta chains, except for 
edge base objects.

IOW, this requires some head scratching which I didn't have the time for 
so far.


Nicolas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gc --aggressive: make it really aggressive
  2007-12-06 12:03           ` [PATCH] gc --aggressive: make it really aggressive Johannes Schindelin
  2007-12-06 13:42             ` Theodore Tso
@ 2007-12-06 14:22             ` Pierre Habouzit
  2007-12-06 15:55               ` Johannes Schindelin
  2007-12-06 15:30             ` Harvey Harrison
  2009-03-18 16:01             ` Johannes Schindelin
  3 siblings, 1 reply; 13+ messages in thread
From: Pierre Habouzit @ 2007-12-06 14:22 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Linus Torvalds, Daniel Berlin, David Miller, ismail, gcc, git,
	gitster

[-- Attachment #1: Type: text/plain, Size: 655 bytes --]

On Thu, Dec 06, 2007 at 12:03:38PM +0000, Johannes Schindelin wrote:
> 
> The default was not to change the window or depth at all.  As suggested
> by Jon Smirl, Linus Torvalds and others, default to
> 
> 	--window=250 --depth=250

  well, this will explode on many quite reasonnably sized systems. This
should also use a memory-limit that could be auto-guessed from the
system total physical memory (50% of the actual memory could be a good
idea e.g.).

  On very large repositories, using that on the e.g. linux kernel, swaps
like hell on a machine with 1Go of ram, and almost nothing running on it
(less than 200Mo of ram actually used)

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gc --aggressive: make it really aggressive
  2007-12-06 12:03           ` [PATCH] gc --aggressive: make it really aggressive Johannes Schindelin
  2007-12-06 13:42             ` Theodore Tso
  2007-12-06 14:22             ` Pierre Habouzit
@ 2007-12-06 15:30             ` Harvey Harrison
  2007-12-06 15:56               ` Johannes Schindelin
  2007-12-06 16:19               ` Linus Torvalds
  2009-03-18 16:01             ` Johannes Schindelin
  3 siblings, 2 replies; 13+ messages in thread
From: Harvey Harrison @ 2007-12-06 15:30 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Linus Torvalds, Daniel Berlin, David Miller, ismail, gcc, git,
	gitster

Wow

/usr/bin/time git repack -a -d -f --window=250 --depth=250


23266.37user 581.04system 7:41:25elapsed 86%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (419835major+123275804minor)pagefaults 0swaps

-r--r--r-- 1 hharrison hharrison  29091872 2007-12-06 07:26
pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.idx
-r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26
pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.pack


That extra delta depth really does make a difference.  Just over a
300MB pack in the end, for all gcc branches/tags as of last night.

Cheers,

Harvey

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gc --aggressive: make it really aggressive
  2007-12-06 14:22             ` Pierre Habouzit
@ 2007-12-06 15:55               ` Johannes Schindelin
  2007-12-06 17:05                 ` David Kastrup
  0 siblings, 1 reply; 13+ messages in thread
From: Johannes Schindelin @ 2007-12-06 15:55 UTC (permalink / raw)
  To: Pierre Habouzit
  Cc: Linus Torvalds, Daniel Berlin, David Miller, ismail, gcc, git,
	gitster

Hi,

On Thu, 6 Dec 2007, Pierre Habouzit wrote:

> On Thu, Dec 06, 2007 at 12:03:38PM +0000, Johannes Schindelin wrote:
> > 
> > The default was not to change the window or depth at all.  As 
> > suggested by Jon Smirl, Linus Torvalds and others, default to
> > 
> > 	--window=250 --depth=250
> 
>   well, this will explode on many quite reasonnably sized systems. This 
> should also use a memory-limit that could be auto-guessed from the 
> system total physical memory (50% of the actual memory could be a good 
> idea e.g.).
> 
>   On very large repositories, using that on the e.g. linux kernel, swaps 
> like hell on a machine with 1Go of ram, and almost nothing running on it 
> (less than 200Mo of ram actually used)

Yes.

However, I think that --aggressive should be aggressive, and if you decide 
to run it on a machine which lacks the muscle to be aggressive, well, you 
should have known better.

The upside: if you run this on a strong machine and clone it to a weak 
machine, you'll still have the benefit of a small pack (and you should 
mark it as .keep, too, to keep the benefit...)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gc --aggressive: make it really aggressive
  2007-12-06 15:30             ` Harvey Harrison
@ 2007-12-06 15:56               ` Johannes Schindelin
  2007-12-06 16:19               ` Linus Torvalds
  1 sibling, 0 replies; 13+ messages in thread
From: Johannes Schindelin @ 2007-12-06 15:56 UTC (permalink / raw)
  To: Harvey Harrison
  Cc: Linus Torvalds, Daniel Berlin, David Miller, ismail, gcc, git,
	gitster

Hi,

On Thu, 6 Dec 2007, Harvey Harrison wrote:

> -r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26
> pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.pack

Wow.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gc --aggressive: make it really aggressive
  2007-12-06 15:30             ` Harvey Harrison
  2007-12-06 15:56               ` Johannes Schindelin
@ 2007-12-06 16:19               ` Linus Torvalds
  1 sibling, 0 replies; 13+ messages in thread
From: Linus Torvalds @ 2007-12-06 16:19 UTC (permalink / raw)
  To: Harvey Harrison
  Cc: Johannes Schindelin, Daniel Berlin, David Miller, ismail, gcc,
	Git Mailing List, Junio C Hamano



On Thu, 6 Dec 2007, Harvey Harrison wrote:
> 
> 7:41:25elapsed 86%CPU

Heh. And this is why you want to do it exactly *once*, and then just 
export the end result for others ;)

> -r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26 pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.pack

But yeah, especially if you allow longer delta chains, the end result can 
be much smaller (and what makes the one-time repack more expensive is the 
window size, not the delta chain - you could make the delta chains longer 
with no cost overhead at packing time)

HOWEVER. 

The longer delta chains do make it potentially much more expensive to then 
use old history. So there's a trade-off. And quite frankly, a delta depth 
of 250 is likely going to cause overflows in the delta cache (which is 
only 256 entries in size *and* it's a hash, so it's going to start having 
hash conflicts long before hitting the 250 depth limit).

So when I said "--depth=250 --window=250", I chose those numbers more as 
an example of extremely aggressive packing, and I'm not at all sure that 
the end result is necessarily wonderfully usable. It's going to save disk 
space (and network bandwidth - the delta's will be re-used for the network 
protocol too!), but there are definitely downsides too, and using long 
delta chains may simply not be worth it in practice.

(And some of it might just want to have git tuning, ie if people think 
that long deltas are worth it, we could easily just expand on the delta 
hash, at the cost of some more memory used!)

That said, the good news is that working with *new* history will not be 
affected negatively, and if you want to be _really_ sneaky, there are ways 
to say "create a pack that contains the history up to a version one year 
ago, and be very aggressive about those old versions that we still want to 
have around, but do a separate pack for newer stuff using less aggressive 
parameters"

So this is something that can be tweaked, although we don't really have 
any really nice interfaces for stuff like that (ie the git delta cache 
size is hardcoded in the sources and cannot be set in the config file, and 
the "pack old history more aggressively" involves some manual scripting 
and knowing how "git pack-objects" works rather than any nice simple 
command line switch).

So the thing to take away from this is:
 - git is certainly flexible as hell
 - .. but to get the full power you may need to tweak things
 - .. happily you really only need to have one person to do the tweaking, 
   and the tweaked end results will be available to others that do not 
   need to know/care.

And whether the difference between 320MB and 500MB is worth any really 
involved tweaking (considering the potential downsides), I really don't 
know. Only testing will tell.

			Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gc --aggressive: make it really aggressive
  2007-12-06 15:55               ` Johannes Schindelin
@ 2007-12-06 17:05                 ` David Kastrup
  0 siblings, 0 replies; 13+ messages in thread
From: David Kastrup @ 2007-12-06 17:05 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Pierre Habouzit, Linus Torvalds, Daniel Berlin, David Miller,
	ismail, gcc, git, gitster

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> However, I think that --aggressive should be aggressive, and if you
> decide to run it on a machine which lacks the muscle to be aggressive,
> well, you should have known better.

That's a rather cheap shot.  "you should have known better" than
expecting to be able to use a documented command and option because the
git developers happened to have a nicer machine...

_How_ is one supposed to have known better?

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gc --aggressive: make it really aggressive
@ 2007-12-06 19:07 J.C. Pizarro
  0 siblings, 0 replies; 13+ messages in thread
From: J.C. Pizarro @ 2007-12-06 19:07 UTC (permalink / raw)
  To: David Kastrup, Johannes Schindelin
  Cc: Pierre Habouzit, Linus Torvalds, Daniel Berlin, David Miller,
	ismail, gcc, git, gitster

On 2007/12/06, David Kastrup <dak@gnu.org> wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > However, I think that --aggressive should be aggressive, and if you
> > decide to run it on a machine which lacks the muscle to be aggressive,
> > well, you should have known better.
>
> That's a rather cheap shot.  "you should have known better" than
> expecting to be able to use a documented command and option because the
> git developers happened to have a nicer machine...
>
> _How_ is one supposed to have known better?
>
> --
> David Kastrup, Kriemhildstr. 15, 44793 Bochum

In GIT, the --aggressive option doesn't make it aggressive.
In GCC, the -Wall option doesn't enable all warnings.

                                                           #
It's a "Tie one to one" with the similar reputations.   #######
                             To have a rest in peace.      #
                                                           #
   J.C.Pizarro                                             #

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gc --aggressive: make it really aggressive
  2007-12-06 12:03           ` [PATCH] gc --aggressive: make it really aggressive Johannes Schindelin
                               ` (2 preceding siblings ...)
  2007-12-06 15:30             ` Harvey Harrison
@ 2009-03-18 16:01             ` Johannes Schindelin
  2009-03-18 16:27               ` Teemu Likonen
  2009-03-18 18:02               ` Nicolas Pitre
  3 siblings, 2 replies; 13+ messages in thread
From: Johannes Schindelin @ 2009-03-18 16:01 UTC (permalink / raw)
  To: git; +Cc: gitster

Hi,

On Thu, 6 Dec 2007, Johannes Schindelin wrote:

> 
> The default was not to change the window or depth at all.  As suggested
> by Jon Smirl, Linus Torvalds and others, default to
> 
> 	--window=250 --depth=250
> 
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---

Guess what.  This is still unresolved, and yet somebody else had to be 
bitten by 'git gc --aggressive' being everything but aggressive.

So...  I think it is high time to resolve the issue, either by applying 
this patch with a delay of over one year, or by the pack wizards trying to 
implement that 'never fall back to a worse delta' idea mentioned in this 
thread.

Although I suggest, really, that implying --depth=250 --window=250 (unless 
overridden by the config) with --aggressive is not at all wrong.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gc --aggressive: make it really aggressive
  2009-03-18 16:01             ` Johannes Schindelin
@ 2009-03-18 16:27               ` Teemu Likonen
  2009-03-18 18:02               ` Nicolas Pitre
  1 sibling, 0 replies; 13+ messages in thread
From: Teemu Likonen @ 2009-03-18 16:27 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, gitster

On 2009-03-18 17:01 (+0100), Johannes Schindelin wrote:

>> The default was not to change the window or depth at all. As
>> suggested by Jon Smirl, Linus Torvalds and others, default to
>> 
>> 	--window=250 --depth=250

> Guess what. This is still unresolved, and yet somebody else had to be
> bitten by 'git gc --aggressive' being everything but aggressive.

Pieter de Bie's tests seem to suggest that usually --window=50
--depth=50 gives about the same results than with higher values:

    http://vcscompare.blogspot.com/2008/06/git-repack-parameters.html

I don't understand the issue very well myself so I really can't say what
would be a/the good value. Anyway, I agree that it would be nice if "git
gc --aggressive" were aggressive and a user wouldn't need to know about
"git repack" and its cryptical low-levelish options.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gc --aggressive: make it really aggressive
  2009-03-18 16:01             ` Johannes Schindelin
  2009-03-18 16:27               ` Teemu Likonen
@ 2009-03-18 18:02               ` Nicolas Pitre
  1 sibling, 0 replies; 13+ messages in thread
From: Nicolas Pitre @ 2009-03-18 18:02 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, gitster

On Wed, 18 Mar 2009, Johannes Schindelin wrote:

> Hi,
> 
> On Thu, 6 Dec 2007, Johannes Schindelin wrote:
> 
> > 
> > The default was not to change the window or depth at all.  As suggested
> > by Jon Smirl, Linus Torvalds and others, default to
> > 
> > 	--window=250 --depth=250
> > 
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> 
> Guess what.  This is still unresolved, and yet somebody else had to be 
> bitten by 'git gc --aggressive' being everything but aggressive.
> 
> So...  I think it is high time to resolve the issue, either by applying 
> this patch with a delay of over one year, or by the pack wizards trying to 
> implement that 'never fall back to a worse delta' idea mentioned in this 
> thread.

This is just a bit complicated to implement (cycle avoidance, etc).

> Although I suggest, really, that implying --depth=250 --window=250 (unless 
> overridden by the config) with --aggressive is not at all wrong.

ACK.


Nicolas

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-03-18 18:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-06 19:07 [PATCH] gc --aggressive: make it really aggressive J.C. Pizarro
  -- strict thread matches above, loose matches on Subject: below --
2007-12-06  3:47 Git and GCC Daniel Berlin
2007-12-06  4:20 ` David Miller
2007-12-06  4:32   ` Daniel Berlin
2007-12-06  4:48     ` David Miller
2007-12-06  5:11       ` Daniel Berlin
2007-12-06  6:09         ` Linus Torvalds
2007-12-06 12:03           ` [PATCH] gc --aggressive: make it really aggressive Johannes Schindelin
2007-12-06 13:42             ` Theodore Tso
2007-12-06 14:15               ` Nicolas Pitre
2007-12-06 14:22             ` Pierre Habouzit
2007-12-06 15:55               ` Johannes Schindelin
2007-12-06 17:05                 ` David Kastrup
2007-12-06 15:30             ` Harvey Harrison
2007-12-06 15:56               ` Johannes Schindelin
2007-12-06 16:19               ` Linus Torvalds
2009-03-18 16:01             ` Johannes Schindelin
2009-03-18 16:27               ` Teemu Likonen
2009-03-18 18:02               ` Nicolas Pitre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).