* Re: [PATCH] gc --aggressive: make it really aggressive
@ 2007-12-06 19:07 J.C. Pizarro
0 siblings, 0 replies; 13+ messages in thread
From: J.C. Pizarro @ 2007-12-06 19:07 UTC (permalink / raw)
To: David Kastrup, Johannes Schindelin
Cc: Pierre Habouzit, Linus Torvalds, Daniel Berlin, David Miller,
ismail, gcc, git, gitster
On 2007/12/06, David Kastrup <dak@gnu.org> wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > However, I think that --aggressive should be aggressive, and if you
> > decide to run it on a machine which lacks the muscle to be aggressive,
> > well, you should have known better.
>
> That's a rather cheap shot. "you should have known better" than
> expecting to be able to use a documented command and option because the
> git developers happened to have a nicer machine...
>
> _How_ is one supposed to have known better?
>
> --
> David Kastrup, Kriemhildstr. 15, 44793 Bochum
In GIT, the --aggressive option doesn't make it aggressive.
In GCC, the -Wall option doesn't enable all warnings.
#
It's a "Tie one to one" with the similar reputations. #######
To have a rest in peace. #
#
J.C.Pizarro #
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Git and GCC
@ 2007-12-06 3:47 Daniel Berlin
2007-12-06 4:20 ` David Miller
0 siblings, 1 reply; 13+ messages in thread
From: Daniel Berlin @ 2007-12-06 3:47 UTC (permalink / raw)
To: David Miller; +Cc: ismail, gcc, git
On 12/5/07, David Miller <davem@davemloft.net> wrote:
> From: "Daniel Berlin" <dberlin@dberlin.org>
> Date: Wed, 5 Dec 2007 21:41:19 -0500
>
> > It is true I gave up quickly, but this is mainly because i don't like
> > to fight with my tools.
> > I am quite fine with a distributed workflow, I now use 8 or so gcc
> > branches in mercurial (auto synced from svn) and merge a lot between
> > them. I wanted to see if git would sanely let me manage the commits
> > back to svn. After fighting with it, i gave up and just wrote a
> > python extension to hg that lets me commit non-svn changesets back to
> > svn directly from hg.
>
> I find it ironic that you were even willing to write tools to
> facilitate your hg based gcc workflow.
Why?
> That really shows what your
> thinking is on this matter, in that you're willing to put effort
> towards making hg work better for you but you're not willing to expend
> that level of effort to see if git can do so as well.
See, now you claim to know my thinking.
I went back to hg because the GIT's space usage wasn't even in the
ballpark, i couldn't get git-svn rebase to update the revs after the
initial import (even though i had properly used a rewriteRoot).
The size is clearly not just svn data, it's in the git pack itself.
I spent a long time working on SVN to reduce it's space usage (repo
side and cleaning up the client side and giving a path to svn devs to
reduce it further), as well as ui issues, and I really don't feel like
having to do the same for GIT.
I'm tired of having to spend a large amount of effort to get my tools
to work. If the community wants to find and fix the problem, i've
already said repeatedly i'll happily give over my repo, data,
whatever. You are correct i am not going to spend even more effort
when i can be productive with something else much quicker. The devil
i know (committing to svn) is better than the devil i don't (diving
into git source code and finding/fixing what is causing this space
blowup).
The python extension took me a few hours (< 4).
In git, i spent these hours waiting for git-gc to finish.
> This is what really eats me from the inside about your dissatisfaction
> with git. Your analysis seems to be a self-fullfilling prophecy, and
> that's totally unfair to both hg and git.
Oh?
You seem to be taking this awfully personally.
I came into this completely open minded. Really, I did (i'm sure
you'll claim otherwise).
GIT people told me it would work great and i'd have a really small git
repo and be able to commit back to svn.
I tried it.
It didn't work out.
It doesn't seem to be usable for whatever reason.
I'm happy to give details, data, whatever.
I made the engineering decision that my effort would be better spent
doing something I knew i could do quickly (make hg commit back to svn
for my purposes) then trying to improve larger issues in GIT (UI and
space usage). That took me a few hours, and I was happy again.
I would have been incredibly happy to have git just have come up with
a 400 meg gcc repository, and to be happily committing away from
git-svn to gcc's repository ...
But it didn't happen.
So far, you have yet to actually do anything but incorrectly tell me
what I am thinking.
I'll probably try again in 6 months, and maybe it will be better.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Git and GCC
2007-12-06 3:47 Git and GCC Daniel Berlin
@ 2007-12-06 4:20 ` David Miller
2007-12-06 4:32 ` Daniel Berlin
0 siblings, 1 reply; 13+ messages in thread
From: David Miller @ 2007-12-06 4:20 UTC (permalink / raw)
To: dberlin; +Cc: ismail, gcc, git
From: "Daniel Berlin" <dberlin@dberlin.org>
Date: Wed, 5 Dec 2007 22:47:01 -0500
> The size is clearly not just svn data, it's in the git pack itself.
And other users have shown much smaller metadata from a GIT import,
and yes those are including all of the repository history and branches
not just the trunk.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Git and GCC
2007-12-06 4:20 ` David Miller
@ 2007-12-06 4:32 ` Daniel Berlin
2007-12-06 4:48 ` David Miller
0 siblings, 1 reply; 13+ messages in thread
From: Daniel Berlin @ 2007-12-06 4:32 UTC (permalink / raw)
To: David Miller; +Cc: ismail, gcc, git
On 12/5/07, David Miller <davem@davemloft.net> wrote:
> From: "Daniel Berlin" <dberlin@dberlin.org>
> Date: Wed, 5 Dec 2007 22:47:01 -0500
>
> > The size is clearly not just svn data, it's in the git pack itself.
>
> And other users have shown much smaller metadata from a GIT import,
> and yes those are including all of the repository history and branches
> not just the trunk.
I followed the instructions in the tutorials.
I followed the instructions given to by people who created these.
I came up with a 1.5 gig pack file.
You want to help, or you want to argue with me.
Right now it sounds like you are trying to blame me or make it look
like i did something wrong.
You are of course, welcome to try it yourself.
I can give you the absolute exactly commands I gave, and with git
1.5.3.7, it will give you a 1.5 gig pack file.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Git and GCC
2007-12-06 4:32 ` Daniel Berlin
@ 2007-12-06 4:48 ` David Miller
2007-12-06 5:11 ` Daniel Berlin
0 siblings, 1 reply; 13+ messages in thread
From: David Miller @ 2007-12-06 4:48 UTC (permalink / raw)
To: dberlin; +Cc: ismail, gcc, git
From: "Daniel Berlin" <dberlin@dberlin.org>
Date: Wed, 5 Dec 2007 23:32:52 -0500
> On 12/5/07, David Miller <davem@davemloft.net> wrote:
> > From: "Daniel Berlin" <dberlin@dberlin.org>
> > Date: Wed, 5 Dec 2007 22:47:01 -0500
> >
> > > The size is clearly not just svn data, it's in the git pack itself.
> >
> > And other users have shown much smaller metadata from a GIT import,
> > and yes those are including all of the repository history and branches
> > not just the trunk.
> I followed the instructions in the tutorials.
> I followed the instructions given to by people who created these.
> I came up with a 1.5 gig pack file.
> You want to help, or you want to argue with me.
Several people replied in this thread showing what options can lead to
smaller pack files.
They also listed what the GIT limitations are that would effect the
kind of work you are doing, which seemed to mostly deal with the high
space cost of branching and tags when converting to/from SVN repos.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Git and GCC
2007-12-06 4:48 ` David Miller
@ 2007-12-06 5:11 ` Daniel Berlin
2007-12-06 6:09 ` Linus Torvalds
0 siblings, 1 reply; 13+ messages in thread
From: Daniel Berlin @ 2007-12-06 5:11 UTC (permalink / raw)
To: David Miller; +Cc: ismail, gcc, git
On 12/5/07, David Miller <davem@davemloft.net> wrote:
> From: "Daniel Berlin" <dberlin@dberlin.org>
> Date: Wed, 5 Dec 2007 23:32:52 -0500
>
> > On 12/5/07, David Miller <davem@davemloft.net> wrote:
> > > From: "Daniel Berlin" <dberlin@dberlin.org>
> > > Date: Wed, 5 Dec 2007 22:47:01 -0500
> > >
> > > > The size is clearly not just svn data, it's in the git pack itself.
> > >
> > > And other users have shown much smaller metadata from a GIT import,
> > > and yes those are including all of the repository history and branches
> > > not just the trunk.
> > I followed the instructions in the tutorials.
> > I followed the instructions given to by people who created these.
> > I came up with a 1.5 gig pack file.
> > You want to help, or you want to argue with me.
>
> Several people replied in this thread showing what options can lead to
> smaller pack files.
Actually, one person did, but that's okay, let's assume it was several.
I am currently trying Harvey's options.
I asked about using the pre-existing repos so i didn't have to do
this, but they were all
1. Done using read-only imports or
2. Don't contain full history
(IE the one that contains full history that is often posted here was
done as a read only import and thus doesn't have the metadata).
> They also listed what the GIT limitations are that would effect the
> kind of work you are doing, which seemed to mostly deal with the high
> space cost of branching and tags when converting to/from SVN repos.
Actually, it turns out that git-gc --aggressive does this dumb thing
to pack files sometimes regardless of whether you converted from an
SVN repo or not.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Git and GCC
2007-12-06 5:11 ` Daniel Berlin
@ 2007-12-06 6:09 ` Linus Torvalds
2007-12-06 12:03 ` [PATCH] gc --aggressive: make it really aggressive Johannes Schindelin
0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2007-12-06 6:09 UTC (permalink / raw)
To: Daniel Berlin; +Cc: David Miller, ismail, gcc, git
On Thu, 6 Dec 2007, Daniel Berlin wrote:
>
> Actually, it turns out that git-gc --aggressive does this dumb thing
> to pack files sometimes regardless of whether you converted from an
> SVN repo or not.
Absolutely. git --aggressive is mostly dumb. It's really only useful for
the case of "I know I have a *really* bad pack, and I want to throw away
all the bad packing decisions I have done".
To explain this, it's worth explaining (you are probably aware of it, but
let me go through the basics anyway) how git delta-chains work, and how
they are so different from most other systems.
In other SCM's, a delta-chain is generally fixed. It might be "forwards"
or "backwards", and it might evolve a bit as you work with the repository,
but generally it's a chain of changes to a single file represented as some
kind of single SCM entity. In CVS, it's obviously the *,v file, and a lot
of other systems do rather similar things.
Git also does delta-chains, but it does them a lot more "loosely". There
is no fixed entity. Delta's are generated against any random other version
that git deems to be a good delta candidate (with various fairly
successful heursitics), and there are absolutely no hard grouping rules.
This is generally a very good thing. It's good for various conceptual
reasons (ie git internally never really even needs to care about the whole
revision chain - it doesn't really think in terms of deltas at all), but
it's also great because getting rid of the inflexible delta rules means
that git doesn't have any problems at all with merging two files together,
for example - there simply are no arbitrary *,v "revision files" that have
some hidden meaning.
It also means that the choice of deltas is a much more open-ended
question. If you limit the delta chain to just one file, you really don't
have a lot of choices on what to do about deltas, but in git, it really
can be a totally different issue.
And this is where the really badly named "--aggressive" comes in. While
git generally tries to re-use delta information (because it's a good idea,
and it doesn't waste CPU time re-finding all the good deltas we found
earlier), sometimes you want to say "let's start all over, with a blank
slate, and ignore all the previous delta information, and try to generate
a new set of deltas".
So "--aggressive" is not really about being aggressive, but about wasting
CPU time re-doing a decision we already did earlier!
*Sometimes* that is a good thing. Some import tools in particular could
generate really horribly bad deltas. Anything that uses "git fast-import",
for example, likely doesn't have much of a great delta layout, so it might
be worth saying "I want to start from a clean slate".
But almost always, in other cases, it's actually a really bad thing to do.
It's going to waste CPU time, and especially if you had actually done a
good job at deltaing earlier, the end result isn't going to re-use all
those *good* deltas you already found, so you'll actually end up with a
much worse end result too!
I'll send a patch to Junio to just remove the "git gc --aggressive"
documentation. It can be useful, but it generally is useful only when you
really understand at a very deep level what it's doing, and that
documentation doesn't help you do that.
Generally, doing incremental "git gc" is the right approach, and better
than doing "git gc --aggressive". It's going to re-use old deltas, and
when those old deltas can't be found (the reason for doing incremental GC
in the first place!) it's going to create new ones.
On the other hand, it's definitely true that an "initial import of a long
and involved history" is a point where it can be worth spending a lot of
time finding the *really*good* deltas. Then, every user ever after (as
long as they don't use "git gc --aggressive" to undo it!) will get the
advantage of that one-time event. So especially for big projects with a
long history, it's probably worth doing some extra work, telling the delta
finding code to go wild.
So the equivalent of "git gc --aggressive" - but done *properly* - is to
do (overnight) something like
git repack -a -d --depth=250 --window=250
where that depth thing is just about how deep the delta chains can be
(make them longer for old history - it's worth the space overhead), and
the window thing is about how big an object window we want each delta
candidate to scan.
And here, you might well want to add the "-f" flag (which is the "drop all
old deltas", since you now are actually trying to make sure that this one
actually finds good candidates.
And then it's going to take forever and a day (ie a "do it overnight"
thing). But the end result is that everybody downstream from that
repository will get much better packs, without having to spend any effort
on it themselves.
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] gc --aggressive: make it really aggressive
2007-12-06 6:09 ` Linus Torvalds
@ 2007-12-06 12:03 ` Johannes Schindelin
2007-12-06 13:42 ` Theodore Tso
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Johannes Schindelin @ 2007-12-06 12:03 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Daniel Berlin, David Miller, ismail, gcc, git, gitster
The default was not to change the window or depth at all. As suggested
by Jon Smirl, Linus Torvalds and others, default to
--window=250 --depth=250
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
On Wed, 5 Dec 2007, Linus Torvalds wrote:
> On Thu, 6 Dec 2007, Daniel Berlin wrote:
> >
> > Actually, it turns out that git-gc --aggressive does this dumb
> > thing to pack files sometimes regardless of whether you
> > converted from an SVN repo or not.
>
> Absolutely. git --aggressive is mostly dumb. It's really only
> useful for the case of "I know I have a *really* bad pack, and I
> want to throw away all the bad packing decisions I have done".
>
> [...]
>
> So the equivalent of "git gc --aggressive" - but done *properly*
> - is to do (overnight) something like
>
> git repack -a -d --depth=250 --window=250
How about this, then?
builtin-gc.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/builtin-gc.c b/builtin-gc.c
index 799c263..c6806d3 100644
--- a/builtin-gc.c
+++ b/builtin-gc.c
@@ -23,7 +23,7 @@ static const char * const builtin_gc_usage[] = {
};
static int pack_refs = 1;
-static int aggressive_window = -1;
+static int aggressive_window = 250;
static int gc_auto_threshold = 6700;
static int gc_auto_pack_limit = 20;
@@ -192,6 +192,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
if (aggressive) {
append_option(argv_repack, "-f", MAX_ADD);
+ append_option(argv_repack, "--depth=250", MAX_ADD);
if (aggressive_window > 0) {
sprintf(buf, "--window=%d", aggressive_window);
append_option(argv_repack, buf, MAX_ADD);
--
1.5.3.7.2157.g9598e
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] gc --aggressive: make it really aggressive
2007-12-06 12:03 ` [PATCH] gc --aggressive: make it really aggressive Johannes Schindelin
@ 2007-12-06 13:42 ` Theodore Tso
2007-12-06 14:15 ` Nicolas Pitre
2007-12-06 14:22 ` Pierre Habouzit
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Theodore Tso @ 2007-12-06 13:42 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Linus Torvalds, Daniel Berlin, David Miller, ismail, gcc, git,
gitster
On Thu, Dec 06, 2007 at 12:03:38PM +0000, Johannes Schindelin wrote:
>
> The default was not to change the window or depth at all. As suggested
> by Jon Smirl, Linus Torvalds and others, default to
>
> --window=250 --depth=250
I'd also suggest adding a comment in the man pages that this should
only be done rarely, and that it can potentially take a *long* time
(i.e., overnight) for big repositories, and in general it's not worth
the effort to use --aggressive.
Apologies to Linus and to the gcc folks, since I was the one who
originally coded up gc --aggressive, and at the time my intent was
"rarely does it make sense, and it may take a long time". The reason
why I didn't make the default --window and --depth larger is because
at the time the biggest repo I had easy access to was the Linux
kernel's, and there you rapidly hit diminishing returns at much
smaller numbers, so there was no real point in using --window=250
--depth=250.
Linus later pointed out that what we *really* should do is at some
point was to change repack -f to potentially retry to find a better
delta, but to reuse the existing delta if it was no worse. That
automatically does the right thing in the case where you had
previously done a repack with --window=<large n> --depth=<large n>,
but then later try using "gc --agressive", which ends up doing a worse
job and throwing away the information from the previous repack with
large window and depth sizes. Unfortunately no one ever got around to
implementing that.
Regards,
- Ted
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] gc --aggressive: make it really aggressive
2007-12-06 13:42 ` Theodore Tso
@ 2007-12-06 14:15 ` Nicolas Pitre
0 siblings, 0 replies; 13+ messages in thread
From: Nicolas Pitre @ 2007-12-06 14:15 UTC (permalink / raw)
To: Theodore Tso
Cc: Johannes Schindelin, Linus Torvalds, Daniel Berlin, David Miller,
ismail, gcc, git, gitster
On Thu, 6 Dec 2007, Theodore Tso wrote:
> Linus later pointed out that what we *really* should do is at some
> point was to change repack -f to potentially retry to find a better
> delta, but to reuse the existing delta if it was no worse. That
> automatically does the right thing in the case where you had
> previously done a repack with --window=<large n> --depth=<large n>,
> but then later try using "gc --agressive", which ends up doing a worse
> job and throwing away the information from the previous repack with
> large window and depth sizes. Unfortunately no one ever got around to
> implementing that.
I did start looking at it, but there are subtle issues to consider, such
as making sure not to create delta loops. Currently this is avoided by
never involving already reused deltas in new delta chains, except for
edge base objects.
IOW, this requires some head scratching which I didn't have the time for
so far.
Nicolas
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] gc --aggressive: make it really aggressive
2007-12-06 12:03 ` [PATCH] gc --aggressive: make it really aggressive Johannes Schindelin
2007-12-06 13:42 ` Theodore Tso
@ 2007-12-06 14:22 ` Pierre Habouzit
2007-12-06 15:55 ` Johannes Schindelin
2007-12-06 15:30 ` Harvey Harrison
2009-03-18 16:01 ` Johannes Schindelin
3 siblings, 1 reply; 13+ messages in thread
From: Pierre Habouzit @ 2007-12-06 14:22 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Linus Torvalds, Daniel Berlin, David Miller, ismail, gcc, git,
gitster
[-- Attachment #1: Type: text/plain, Size: 655 bytes --]
On Thu, Dec 06, 2007 at 12:03:38PM +0000, Johannes Schindelin wrote:
>
> The default was not to change the window or depth at all. As suggested
> by Jon Smirl, Linus Torvalds and others, default to
>
> --window=250 --depth=250
well, this will explode on many quite reasonnably sized systems. This
should also use a memory-limit that could be auto-guessed from the
system total physical memory (50% of the actual memory could be a good
idea e.g.).
On very large repositories, using that on the e.g. linux kernel, swaps
like hell on a machine with 1Go of ram, and almost nothing running on it
(less than 200Mo of ram actually used)
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] gc --aggressive: make it really aggressive
2007-12-06 14:22 ` Pierre Habouzit
@ 2007-12-06 15:55 ` Johannes Schindelin
2007-12-06 17:05 ` David Kastrup
0 siblings, 1 reply; 13+ messages in thread
From: Johannes Schindelin @ 2007-12-06 15:55 UTC (permalink / raw)
To: Pierre Habouzit
Cc: Linus Torvalds, Daniel Berlin, David Miller, ismail, gcc, git,
gitster
Hi,
On Thu, 6 Dec 2007, Pierre Habouzit wrote:
> On Thu, Dec 06, 2007 at 12:03:38PM +0000, Johannes Schindelin wrote:
> >
> > The default was not to change the window or depth at all. As
> > suggested by Jon Smirl, Linus Torvalds and others, default to
> >
> > --window=250 --depth=250
>
> well, this will explode on many quite reasonnably sized systems. This
> should also use a memory-limit that could be auto-guessed from the
> system total physical memory (50% of the actual memory could be a good
> idea e.g.).
>
> On very large repositories, using that on the e.g. linux kernel, swaps
> like hell on a machine with 1Go of ram, and almost nothing running on it
> (less than 200Mo of ram actually used)
Yes.
However, I think that --aggressive should be aggressive, and if you decide
to run it on a machine which lacks the muscle to be aggressive, well, you
should have known better.
The upside: if you run this on a strong machine and clone it to a weak
machine, you'll still have the benefit of a small pack (and you should
mark it as .keep, too, to keep the benefit...)
Ciao,
Dscho
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] gc --aggressive: make it really aggressive
2007-12-06 15:55 ` Johannes Schindelin
@ 2007-12-06 17:05 ` David Kastrup
0 siblings, 0 replies; 13+ messages in thread
From: David Kastrup @ 2007-12-06 17:05 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Pierre Habouzit, Linus Torvalds, Daniel Berlin, David Miller,
ismail, gcc, git, gitster
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> However, I think that --aggressive should be aggressive, and if you
> decide to run it on a machine which lacks the muscle to be aggressive,
> well, you should have known better.
That's a rather cheap shot. "you should have known better" than
expecting to be able to use a documented command and option because the
git developers happened to have a nicer machine...
_How_ is one supposed to have known better?
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] gc --aggressive: make it really aggressive
2007-12-06 12:03 ` [PATCH] gc --aggressive: make it really aggressive Johannes Schindelin
2007-12-06 13:42 ` Theodore Tso
2007-12-06 14:22 ` Pierre Habouzit
@ 2007-12-06 15:30 ` Harvey Harrison
2007-12-06 15:56 ` Johannes Schindelin
2007-12-06 16:19 ` Linus Torvalds
2009-03-18 16:01 ` Johannes Schindelin
3 siblings, 2 replies; 13+ messages in thread
From: Harvey Harrison @ 2007-12-06 15:30 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Linus Torvalds, Daniel Berlin, David Miller, ismail, gcc, git,
gitster
Wow
/usr/bin/time git repack -a -d -f --window=250 --depth=250
23266.37user 581.04system 7:41:25elapsed 86%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (419835major+123275804minor)pagefaults 0swaps
-r--r--r-- 1 hharrison hharrison 29091872 2007-12-06 07:26
pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.idx
-r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26
pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.pack
That extra delta depth really does make a difference. Just over a
300MB pack in the end, for all gcc branches/tags as of last night.
Cheers,
Harvey
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] gc --aggressive: make it really aggressive
2007-12-06 15:30 ` Harvey Harrison
@ 2007-12-06 15:56 ` Johannes Schindelin
2007-12-06 16:19 ` Linus Torvalds
1 sibling, 0 replies; 13+ messages in thread
From: Johannes Schindelin @ 2007-12-06 15:56 UTC (permalink / raw)
To: Harvey Harrison
Cc: Linus Torvalds, Daniel Berlin, David Miller, ismail, gcc, git,
gitster
Hi,
On Thu, 6 Dec 2007, Harvey Harrison wrote:
> -r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26
> pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.pack
Wow.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] gc --aggressive: make it really aggressive
2007-12-06 15:30 ` Harvey Harrison
2007-12-06 15:56 ` Johannes Schindelin
@ 2007-12-06 16:19 ` Linus Torvalds
1 sibling, 0 replies; 13+ messages in thread
From: Linus Torvalds @ 2007-12-06 16:19 UTC (permalink / raw)
To: Harvey Harrison
Cc: Johannes Schindelin, Daniel Berlin, David Miller, ismail, gcc,
Git Mailing List, Junio C Hamano
On Thu, 6 Dec 2007, Harvey Harrison wrote:
>
> 7:41:25elapsed 86%CPU
Heh. And this is why you want to do it exactly *once*, and then just
export the end result for others ;)
> -r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26 pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.pack
But yeah, especially if you allow longer delta chains, the end result can
be much smaller (and what makes the one-time repack more expensive is the
window size, not the delta chain - you could make the delta chains longer
with no cost overhead at packing time)
HOWEVER.
The longer delta chains do make it potentially much more expensive to then
use old history. So there's a trade-off. And quite frankly, a delta depth
of 250 is likely going to cause overflows in the delta cache (which is
only 256 entries in size *and* it's a hash, so it's going to start having
hash conflicts long before hitting the 250 depth limit).
So when I said "--depth=250 --window=250", I chose those numbers more as
an example of extremely aggressive packing, and I'm not at all sure that
the end result is necessarily wonderfully usable. It's going to save disk
space (and network bandwidth - the delta's will be re-used for the network
protocol too!), but there are definitely downsides too, and using long
delta chains may simply not be worth it in practice.
(And some of it might just want to have git tuning, ie if people think
that long deltas are worth it, we could easily just expand on the delta
hash, at the cost of some more memory used!)
That said, the good news is that working with *new* history will not be
affected negatively, and if you want to be _really_ sneaky, there are ways
to say "create a pack that contains the history up to a version one year
ago, and be very aggressive about those old versions that we still want to
have around, but do a separate pack for newer stuff using less aggressive
parameters"
So this is something that can be tweaked, although we don't really have
any really nice interfaces for stuff like that (ie the git delta cache
size is hardcoded in the sources and cannot be set in the config file, and
the "pack old history more aggressively" involves some manual scripting
and knowing how "git pack-objects" works rather than any nice simple
command line switch).
So the thing to take away from this is:
- git is certainly flexible as hell
- .. but to get the full power you may need to tweak things
- .. happily you really only need to have one person to do the tweaking,
and the tweaked end results will be available to others that do not
need to know/care.
And whether the difference between 320MB and 500MB is worth any really
involved tweaking (considering the potential downsides), I really don't
know. Only testing will tell.
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] gc --aggressive: make it really aggressive
2007-12-06 12:03 ` [PATCH] gc --aggressive: make it really aggressive Johannes Schindelin
` (2 preceding siblings ...)
2007-12-06 15:30 ` Harvey Harrison
@ 2009-03-18 16:01 ` Johannes Schindelin
2009-03-18 16:27 ` Teemu Likonen
2009-03-18 18:02 ` Nicolas Pitre
3 siblings, 2 replies; 13+ messages in thread
From: Johannes Schindelin @ 2009-03-18 16:01 UTC (permalink / raw)
To: git; +Cc: gitster
Hi,
On Thu, 6 Dec 2007, Johannes Schindelin wrote:
>
> The default was not to change the window or depth at all. As suggested
> by Jon Smirl, Linus Torvalds and others, default to
>
> --window=250 --depth=250
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
Guess what. This is still unresolved, and yet somebody else had to be
bitten by 'git gc --aggressive' being everything but aggressive.
So... I think it is high time to resolve the issue, either by applying
this patch with a delay of over one year, or by the pack wizards trying to
implement that 'never fall back to a worse delta' idea mentioned in this
thread.
Although I suggest, really, that implying --depth=250 --window=250 (unless
overridden by the config) with --aggressive is not at all wrong.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] gc --aggressive: make it really aggressive
2009-03-18 16:01 ` Johannes Schindelin
@ 2009-03-18 16:27 ` Teemu Likonen
2009-03-18 18:02 ` Nicolas Pitre
1 sibling, 0 replies; 13+ messages in thread
From: Teemu Likonen @ 2009-03-18 16:27 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, gitster
On 2009-03-18 17:01 (+0100), Johannes Schindelin wrote:
>> The default was not to change the window or depth at all. As
>> suggested by Jon Smirl, Linus Torvalds and others, default to
>>
>> --window=250 --depth=250
> Guess what. This is still unresolved, and yet somebody else had to be
> bitten by 'git gc --aggressive' being everything but aggressive.
Pieter de Bie's tests seem to suggest that usually --window=50
--depth=50 gives about the same results than with higher values:
http://vcscompare.blogspot.com/2008/06/git-repack-parameters.html
I don't understand the issue very well myself so I really can't say what
would be a/the good value. Anyway, I agree that it would be nice if "git
gc --aggressive" were aggressive and a user wouldn't need to know about
"git repack" and its cryptical low-levelish options.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] gc --aggressive: make it really aggressive
2009-03-18 16:01 ` Johannes Schindelin
2009-03-18 16:27 ` Teemu Likonen
@ 2009-03-18 18:02 ` Nicolas Pitre
1 sibling, 0 replies; 13+ messages in thread
From: Nicolas Pitre @ 2009-03-18 18:02 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, gitster
On Wed, 18 Mar 2009, Johannes Schindelin wrote:
> Hi,
>
> On Thu, 6 Dec 2007, Johannes Schindelin wrote:
>
> >
> > The default was not to change the window or depth at all. As suggested
> > by Jon Smirl, Linus Torvalds and others, default to
> >
> > --window=250 --depth=250
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
>
> Guess what. This is still unresolved, and yet somebody else had to be
> bitten by 'git gc --aggressive' being everything but aggressive.
>
> So... I think it is high time to resolve the issue, either by applying
> this patch with a delay of over one year, or by the pack wizards trying to
> implement that 'never fall back to a worse delta' idea mentioned in this
> thread.
This is just a bit complicated to implement (cycle avoidance, etc).
> Although I suggest, really, that implying --depth=250 --window=250 (unless
> overridden by the config) with --aggressive is not at all wrong.
ACK.
Nicolas
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-03-18 18:04 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-06 19:07 [PATCH] gc --aggressive: make it really aggressive J.C. Pizarro
-- strict thread matches above, loose matches on Subject: below --
2007-12-06 3:47 Git and GCC Daniel Berlin
2007-12-06 4:20 ` David Miller
2007-12-06 4:32 ` Daniel Berlin
2007-12-06 4:48 ` David Miller
2007-12-06 5:11 ` Daniel Berlin
2007-12-06 6:09 ` Linus Torvalds
2007-12-06 12:03 ` [PATCH] gc --aggressive: make it really aggressive Johannes Schindelin
2007-12-06 13:42 ` Theodore Tso
2007-12-06 14:15 ` Nicolas Pitre
2007-12-06 14:22 ` Pierre Habouzit
2007-12-06 15:55 ` Johannes Schindelin
2007-12-06 17:05 ` David Kastrup
2007-12-06 15:30 ` Harvey Harrison
2007-12-06 15:56 ` Johannes Schindelin
2007-12-06 16:19 ` Linus Torvalds
2009-03-18 16:01 ` Johannes Schindelin
2009-03-18 16:27 ` Teemu Likonen
2009-03-18 18:02 ` Nicolas Pitre
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).