git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git-gc "--aggressive" somewhat broken
@ 2007-07-06 20:19 ` Linus Torvalds
  2007-07-06 22:17   ` Linus Torvalds
  0 siblings, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2007-07-06 20:19 UTC (permalink / raw)
  To: Theodore Tso, Junio C Hamano; +Cc: Git Mailing List


This flag seems misdesigned as-is.

It makes the window bigger (good, and aggressive), but it also enabled 
"-f".

Which sometimes causes _worse_ packing.

In particular, there's a gcc import at

	git://git.infradead.org/toolchain/gcc.git

which apparently packs down quite nicely to a 450MB pack, but when it gets 
repacked "aggressively", it blows up to 1.4GB. The reason? That pack was 
generated using --depth=100 --window=100 (I think).

So under certain circumstances, "--aggressive" is anything _but_ 
aggressive, and actually causes much worse packing.

If we want to be really aggressive, we migth decide to pass a new flag to 
pack-objects that does something closer to what "aggressive" was meant to 
do: it would use existing delta's if they exist, but _despite_ existing it 
could look if there are even better choices.

So right now we have:

 - default behaviour:

   always re-use existing deltas, don't look at alternatives at all.

   This is optimal for CPU/memory/IO usage, and is generally a good idea

 - "-f" (and as a result, the current bad "git gc --aggressive"):

   never re-use existing deltas, always look for new ones.

   This is good if you have reason to believe that the old choices are 
   bad, or you need to force re-generation of deltas (because you want to 
   force a new pack-file format, for example)

and the missing piece might be

 - "git pack-objects --aggressive":

   re-use existing deltas _and_ look for even better ones.

   This is good if all you're looking for is better packing.

Hmm?

		Linus

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: git-gc "--aggressive" somewhat broken
  2007-07-06 20:19 ` git-gc "--aggressive" somewhat broken Linus Torvalds
@ 2007-07-06 22:17   ` Linus Torvalds
  2007-07-07  1:16     ` Linus Torvalds
  2007-07-09  3:46     ` Nicolas Pitre
  0 siblings, 2 replies; 4+ messages in thread
From: Linus Torvalds @ 2007-07-06 22:17 UTC (permalink / raw)
  To: Theodore Tso, Junio C Hamano; +Cc: Git Mailing List



On Fri, 6 Jul 2007, Linus Torvalds wrote:
> 
> If we want to be really aggressive, we migth decide to pass a new flag to 
> pack-objects that does something closer to what "aggressive" was meant to 
> do: it would use existing delta's if they exist, but _despite_ existing it 
> could look if there are even better choices.

This is a totally untested patch that may or may not work.

The reason I say "may not work" is not just that I haven't really tested 
it, it's also because I haven't thought it through very well.

In particular, does this possibly cause infinite loops of delta chains? 
Probably. It would need code to explicitly make sure that we don't do 
that, but I couldn't even convince myself as to why we might not hit that 
case _already_ with delta re-use, so maybe there's something going that 
protects us against it.

The patch itself is trivial, except for hunk #2, which fixes up the fact 
that we didn't fill in the "entry->size" correctly for a delta entry (we 
left it at the *delta* size). It didn't use to matter, since the entry 
size wasn't ever used for anything, I think (with the possible exception 
of the object sorting).

Anyway, consider this a starting point for somebody else who wants to 
really try to look into this. But I do think that "git gc --aggressive" is 
broken as it stands now.

		Linus

---
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 3d396ca..89e9900 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -57,6 +57,7 @@ static struct object_entry *objects;
 static struct object_entry **written_list;
 static uint32_t nr_objects, nr_alloc, nr_result, nr_written;
 
+static int aggressive;
 static int non_empty;
 static int no_reuse_delta, no_reuse_object;
 static int local;
@@ -1179,6 +1180,8 @@ static void check_object(struct object_entry *entry)
 			entry->delta = base_entry;
 			entry->delta_sibling = base_entry->delta_child;
 			base_entry->delta_child = entry;
+			entry->size = get_size_from_delta(p, &w_curs,
+				entry->in_pack_offset + entry->in_pack_header_size);
 			unuse_pack(&w_curs);
 			return;
 		}
@@ -1425,7 +1428,7 @@ static void find_deltas(struct object_entry **list, int window, int depth)
 		if (progress)
 			display_progress(&progress_state, processed);
 
-		if (entry->delta)
+		if (entry->delta && !aggressive)
 			/* This happens if we decided to reuse existing
 			 * delta from a pack.  "!no_reuse_delta &&" is implied.
 			 */
@@ -1760,6 +1763,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 				die("bad %s", arg);
 			continue;
 		}
+		if (!strcmp("--aggressive", arg)) {
+			aggressive = 1;
+			continue;
+		}
 		usage(pack_usage);
 	}
 

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: git-gc "--aggressive" somewhat broken
  2007-07-06 22:17   ` Linus Torvalds
@ 2007-07-07  1:16     ` Linus Torvalds
  2007-07-09  3:46     ` Nicolas Pitre
  1 sibling, 0 replies; 4+ messages in thread
From: Linus Torvalds @ 2007-07-07  1:16 UTC (permalink / raw)
  To: Theodore Tso, Junio C Hamano; +Cc: Git Mailing List



On Fri, 6 Jul 2007, Linus Torvalds wrote:
> 
> This is a totally untested patch that may or may not work.

And by that I obviously mean "does not work".

I only tested on the gcc thing, and it causes an assertion error when 
trying to actually write out the deltas ("corrupt packed object").

I'm sure somebody can figure it out, 

		Linus

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: git-gc "--aggressive" somewhat broken
  2007-07-06 22:17   ` Linus Torvalds
  2007-07-07  1:16     ` Linus Torvalds
@ 2007-07-09  3:46     ` Nicolas Pitre
  1 sibling, 0 replies; 4+ messages in thread
From: Nicolas Pitre @ 2007-07-09  3:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Theodore Tso, Junio C Hamano, Git Mailing List

On Fri, 6 Jul 2007, Linus Torvalds wrote:

> 
> 
> On Fri, 6 Jul 2007, Linus Torvalds wrote:
> > 
> > If we want to be really aggressive, we migth decide to pass a new flag to 
> > pack-objects that does something closer to what "aggressive" was meant to 
> > do: it would use existing delta's if they exist, but _despite_ existing it 
> > could look if there are even better choices.
> 
> This is a totally untested patch that may or may not work.
> 
> The reason I say "may not work" is not just that I haven't really tested 
> it, it's also because I haven't thought it through very well.
> 
> In particular, does this possibly cause infinite loops of delta chains? 
> Probably. It would need code to explicitly make sure that we don't do 
> that, but I couldn't even convince myself as to why we might not hit that 
> case _already_ with delta re-use, so maybe there's something going that 
> protects us against it.

There is.

> Anyway, consider this a starting point for somebody else who wants to 
> really try to look into this.

This is a real teaser.  But I have real work to do if I want to leave on 
vacation this summer, and therefore I'll then be on vacation.  So if I 
end up looking at it myself it will be in September.


Nicolas

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-07-09  3:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <alpine.LFD.0.999.0707061310390.8278@woody.linux-foundation.org >
2007-07-06 20:19 ` git-gc "--aggressive" somewhat broken Linus Torvalds
2007-07-06 22:17   ` Linus Torvalds
2007-07-07  1:16     ` Linus Torvalds
2007-07-09  3:46     ` Nicolas Pitre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).