git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] apply delta depth bias to already deltified objects
@ 2007-07-12  6:38 Nicolas Pitre
  2007-07-12 15:20 ` Brian Downing
  2007-07-12 18:33 ` Nicolas Pitre
  0 siblings, 2 replies; 6+ messages in thread
From: Nicolas Pitre @ 2007-07-12  6:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git


We already apply a bias on the initial delta attempt with max_size being 
a function of the base object depth.  This already has the effect of 
favoring shallower deltas even if deeper deltas could be smaller, and 
therefore creating a wider delta tree (see commits 4e8da195 and 
c3b06a69).

This principle should also be applied to all delta attempts for the same 
object and not only the first attempt.  With this the criteria for the 
best delta is not only its size but also its depth, so that a shallower 
delta might be selected even if it is larger than a deeper one.  Even if 
some deltas get larger, they allow for wider delta trees making the 
depth limit less quickly reached and therefore better deltas can be 
subsequently found, keeping the resulting pack size equal or even 
smaller.  Runtime access to the pack should also benefit from shallower 
deltas.

Testing on the GIT repo before this patch provided those numbers:

$ git repack -a -f --depth=10
Total 56534 (delta 39441), reused 0 (delta 0)
29.38user 0.33system 0:29.73elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+20996minor)pagefaults 0swaps

$ git verify-pack -v
chain length = 1: 3150 objects
chain length = 2: 2713 objects
chain length = 3: 2441 objects
chain length = 4: 2205 objects
chain length = 5: 2014 objects
chain length = 6: 1895 objects
chain length = 7: 1802 objects
chain length = 8: 1997 objects
chain length = 9: 2710 objects
chain length = 10: 18514 objects

Pack size is 18273039 bytes.

With this patch:

$ git repack -a -f --depth=10
Total 56534 (delta 39370), reused 0 (delta 0)
24.32user 0.35system 0:24.78elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+19142minor)pagefaults 0swaps

$ git verify-pack -v
chain length = 1: 4059 objects
chain length = 2: 4048 objects
chain length = 3: 3775 objects
chain length = 4: 3428 objects
chain length = 5: 3007 objects
chain length = 6: 2793 objects
chain length = 7: 2701 objects
chain length = 8: 2988 objects
chain length = 9: 3935 objects
chain length = 10: 8636 objects

Pack size is 16311008 bytes.

Summary: this patch makes for slightly faster repacks, a smaller pack, 
and a much lower count of object hanging on the max delta depth.

Same repo again with the default depth of 50:

Before:
Total 56534 (delta 39545), reused 0 (delta 0)
23.01user 0.36system 0:23.40elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+21017minor)pagefaults 0swaps
chain length = 1: 2882 objects
chain length = 2: 2443 objects
chain length = 3: 2157 objects
[...]
chain length = 48: 505 objects
chain length = 49: 782 objects
chain length = 50: 1802 objects
Pack size: 14651260 bytes.

After:

Total 56534 (delta 39596), reused 0 (delta 0)
22.98user 0.20system 0:23.20elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+20560minor)pagefaults 0swaps
chain length = 1: 2985 objects
chain length = 2: 2625 objects
chain length = 3: 2282 objects
[...]
chain length = 48: 477 objects
chain length = 49: 564 objects
chain length = 50: 392 objects
Pack size: 14383165 bytes.

With the Linux kernel repo:

Before:

Total 531582 (delta 431824), reused 0 (delta 0)
276.66user 3.64system 4:45.35elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+149606minor)pagefaults 0swaps
chain length = 1: 40779 objects
chain length = 2: 35650 objects
chain length = 3: 30226 objects
chain length = 4: 25619 objects
chain length = 5: 21709 objects
[...]
chain length = 46: 2223 objects
chain length = 47: 3263 objects
chain length = 48: 3175 objects
chain length = 49: 2128 objects
chain length = 50: 6413 objects
Pack size: 155680720 bytes.

After:

Total 531582 (delta 432537), reused 0 (delta 0)
270.31user 3.15system 4:35.07elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+140517minor)pagefaults 0swaps
chain length = 1: 41555 objects
chain length = 2: 36919 objects
chain length = 3: 31217 objects
chain length = 4: 26762 objects
chain length = 5: 22348 objects
[...]
chain length = 46: 1842 objects
chain length = 47: 1800 objects
chain length = 48: 1483 objects
chain length = 49: 1216 objects
chain length = 50: 1310 objects
Pack size: 154421656 bytes.


Signed-off-by: Nicolas Pitre <nico@cam.org>
---

This apparently makes BRian's patological case worse (although better 
than before his same-size-shallower patch), but I think that the 
improvement in the general case is worth it.  Even Brian's pack gets 
smaller so...

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 54b9d26..1eb86ed 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -1332,12 +1332,14 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
 
 	/* Now some size filtering heuristics. */
 	trg_size = trg_entry->size;
-	max_size = trg_size/2 - 20;
+	if (!trg_entry->delta)
+		max_size = trg_size/2 - 20;
+	else
+		max_size = trg_entry->delta_size * max_depth /
+				(max_depth - trg_entry->depth + 1);
 	max_size = max_size * (max_depth - src_entry->depth) / max_depth;
 	if (max_size == 0)
 		return 0;
-	if (trg_entry->delta && trg_entry->delta_size <= max_size)
-		max_size = trg_entry->delta_size;
 	src_size = src_entry->size;
 	sizediff = src_size < trg_size ? trg_size - src_size : 0;
 	if (sizediff >= max_size)

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] apply delta depth bias to already deltified objects
  2007-07-12  6:38 [PATCH] apply delta depth bias to already deltified objects Nicolas Pitre
@ 2007-07-12 15:20 ` Brian Downing
  2007-07-12 16:27   ` Nicolas Pitre
  2007-07-12 18:33 ` Nicolas Pitre
  1 sibling, 1 reply; 6+ messages in thread
From: Brian Downing @ 2007-07-12 15:20 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git

On Thu, Jul 12, 2007 at 02:38:30AM -0400, Nicolas Pitre wrote:
> This apparently makes BRian's patological case worse (although better 
> than before his same-size-shallower patch), but I think that the 
> improvement in the general case is worth it.  Even Brian's pack gets 
> smaller so...

I've found why this makes my case worse, and I think it's correctable
and will benefit everything when fixed:

Let's say we've currently got a delta match of 11 bytes at depth 5.
So trg_entry->delta_size = 11 and trg_entry->depth = 5.  max_depth is
100.

Now let's say the next object we're comparing against is at depth 2
(src_entry->depth = 2).  Even if we can find a delta of the same size
we should take it.

Now, with Nico's new patch:

		max_size = trg_entry->delta_size * max_depth /
				(max_depth - trg_entry->depth + 1);

max_size is now 11.  So far so good.

Now, however, the other bias happens:

	max_size = max_size * (max_depth - src_entry->depth) / max_depth;

    max_size = 11 * (100 - 2) / 100;
    max_size = 1078 / 100;
    max_size = 10;

This was okay when max_size was always (trg_size/2 - 20) here, but now
it's cutting it off too much.  max_size is now 10, and we can't make
a better depth match of the same size anymore.

I think the second bias equation should be scaled so as not to take
effect unless (src_entry->depth [+ 1?] > trg_entry->depth).

Other than this flaw I think this patch looks great.

-bcd

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] apply delta depth bias to already deltified objects
  2007-07-12 15:20 ` Brian Downing
@ 2007-07-12 16:27   ` Nicolas Pitre
  2007-07-12 16:44     ` Brian Downing
  0 siblings, 1 reply; 6+ messages in thread
From: Nicolas Pitre @ 2007-07-12 16:27 UTC (permalink / raw)
  To: Brian Downing; +Cc: Junio C Hamano, git

On Thu, 12 Jul 2007, Brian Downing wrote:

> On Thu, Jul 12, 2007 at 02:38:30AM -0400, Nicolas Pitre wrote:
> > This apparently makes BRian's patological case worse (although better 
> > than before his same-size-shallower patch), but I think that the 
> > improvement in the general case is worth it.  Even Brian's pack gets 
> > smaller so...
> 
> I've found why this makes my case worse, and I think it's correctable
> and will benefit everything when fixed:
> 
> Let's say we've currently got a delta match of 11 bytes at depth 5.
> So trg_entry->delta_size = 11 and trg_entry->depth = 5.  max_depth is
> 100.
> 
> Now let's say the next object we're comparing against is at depth 2
> (src_entry->depth = 2).  Even if we can find a delta of the same size
> we should take it.
> 
> Now, with Nico's new patch:
> 
> 		max_size = trg_entry->delta_size * max_depth /
> 				(max_depth - trg_entry->depth + 1);
> 
> max_size is now 11.  So far so good.
> 
> Now, however, the other bias happens:
> 
> 	max_size = max_size * (max_depth - src_entry->depth) / max_depth;
> 
>     max_size = 11 * (100 - 2) / 100;
>     max_size = 1078 / 100;
>     max_size = 10;

Hmmm... Integer truncation errors.

In theory, the allowed max_size should be slightly higher than what we 
got in the depth 5 case because this case is less deep.  So...

    max_size = trg_entry->delta_size * max_depth /
                    (max_depth - trg_entry->depth + 1);
    max_size = 11 * 100 / (100 - 5 + 1) = 11.4583

    max_size = max_size * (max_depth - src_entry->depth) / max_depth;
    max_size = 11.4583 * (100 - 2) / 100 = 11.2292

So the max_size, because the depth is less, is slightly higher.

> This was okay when max_size was always (trg_size/2 - 20) here, but now
> it's cutting it off too much.  max_size is now 10, and we can't make
> a better depth match of the same size anymore.
> 
> I think the second bias equation should be scaled so as not to take
> effect unless (src_entry->depth [+ 1?] > trg_entry->depth).

Better yet, the integer truncation error should be compensated for, with 
this:

    max_size =
        (trg_entry->delta_size * max_depth + max_depth - trg_entry->depth) /
                    (max_depth - trg_entry->depth + 1);


Nicolas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] apply delta depth bias to already deltified objects
  2007-07-12 16:27   ` Nicolas Pitre
@ 2007-07-12 16:44     ` Brian Downing
  2007-07-12 18:07       ` Nicolas Pitre
  0 siblings, 1 reply; 6+ messages in thread
From: Brian Downing @ 2007-07-12 16:44 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git

On Thu, Jul 12, 2007 at 12:27:02PM -0400, Nicolas Pitre wrote:
> Better yet, the integer truncation error should be compensated for, with 
> this:
> 
>     max_size =
>         (trg_entry->delta_size * max_depth + max_depth - trg_entry->depth) /
>                     (max_depth - trg_entry->depth + 1);

Yep, with this, my degenerate case seems to find the optimum solution
(depth ~ 65) even at crazy maximum depths like 1000.

Looks good to me.

-bcd

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] apply delta depth bias to already deltified objects
  2007-07-12 16:44     ` Brian Downing
@ 2007-07-12 18:07       ` Nicolas Pitre
  0 siblings, 0 replies; 6+ messages in thread
From: Nicolas Pitre @ 2007-07-12 18:07 UTC (permalink / raw)
  To: Brian Downing; +Cc: Junio C Hamano, git

On Thu, 12 Jul 2007, Brian Downing wrote:

> On Thu, Jul 12, 2007 at 12:27:02PM -0400, Nicolas Pitre wrote:
> > Better yet, the integer truncation error should be compensated for, with 
> > this:
> > 
> >     max_size =
> >         (trg_entry->delta_size * max_depth + max_depth - trg_entry->depth) /
> >                     (max_depth - trg_entry->depth + 1);
> 
> Yep, with this, my degenerate case seems to find the optimum solution
> (depth ~ 65) even at crazy maximum depths like 1000.
> 
> Looks good to me.

Great.

Now to conclude this, I have a patch with much simpler math which I'll 
post rsn.


Nicolas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] apply delta depth bias to already deltified objects
  2007-07-12  6:38 [PATCH] apply delta depth bias to already deltified objects Nicolas Pitre
  2007-07-12 15:20 ` Brian Downing
@ 2007-07-12 18:33 ` Nicolas Pitre
  1 sibling, 0 replies; 6+ messages in thread
From: Nicolas Pitre @ 2007-07-12 18:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Brian Downing, git

We already apply a bias on the initial delta attempt with max_size being 
a function of the base object depth.  This has the effect of favoring 
shallower deltas even if deeper deltas could be smaller, and therefore 
creating a wider delta tree (see commits 4e8da195 and c3b06a69).

This principle should also be applied to all delta attempts for the same 
object and not only the first attempt.  With this the criteria for the 
best delta is not only its size but also its depth, so that a shallower 
delta might be selected even if it is larger than a deeper one.  Even if 
some deltas get larger, they allow for wider delta trees making the 
depth limit less quickly reached and therefore better deltas can be 
subsequently found, keeping the resulting pack size even smaller.  
Runtime access to the pack should also benefit from shallower deltas.

Testing on different repositories showed slighter faster repacks, 
smaller resulting packs, and a much nicer curve for delta depth 
distribution with no more peak at the maximum depth level.  
Improvements are even more significant with smaller depth limits.

Signed-off-by: Nicolas Pitre <nico@cam.org>
---

No, I wonn't provide the full test results again.  Yes, they're 
different and even slightly better with this version, but I can't be 
bothered to run them all and wait for the results again.  You'll have to 
take my word or do your own.

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 54b9d26..b4f3e7c 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -1303,6 +1303,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
 	struct object_entry *trg_entry = trg->entry;
 	struct object_entry *src_entry = src->entry;
 	unsigned long trg_size, src_size, delta_size, sizediff, max_size, sz;
+	unsigned ref_depth;
 	enum object_type type;
 	void *delta_buf;
 
@@ -1332,12 +1333,17 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
 
 	/* Now some size filtering heuristics. */
 	trg_size = trg_entry->size;
-	max_size = trg_size/2 - 20;
-	max_size = max_size * (max_depth - src_entry->depth) / max_depth;
+	if (!trg_entry->delta) {
+		max_size = trg_size/2 - 20;
+		ref_depth = 1;
+	} else {
+		max_size = trg_entry->delta_size;
+		ref_depth = trg_entry->depth;
+	}
+	max_size = max_size * (max_depth - src_entry->depth) /
+						(max_depth - ref_depth + 1);
 	if (max_size == 0)
 		return 0;
-	if (trg_entry->delta && trg_entry->delta_size <= max_size)
-		max_size = trg_entry->delta_size;
 	src_size = src_entry->size;
 	sizediff = src_size < trg_size ? trg_size - src_size : 0;
 	if (sizediff >= max_size)

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-07-12 18:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-12  6:38 [PATCH] apply delta depth bias to already deltified objects Nicolas Pitre
2007-07-12 15:20 ` Brian Downing
2007-07-12 16:27   ` Nicolas Pitre
2007-07-12 16:44     ` Brian Downing
2007-07-12 18:07       ` Nicolas Pitre
2007-07-12 18:33 ` Nicolas Pitre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).