git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] avoid possible overflow in delta size filtering computation
@ 2009-03-24 19:56 Nicolas Pitre
  2009-03-24 20:20 ` Brandon Casey
  2009-03-25 12:15 ` Kjetil Barvik
  0 siblings, 2 replies; 10+ messages in thread
From: Nicolas Pitre @ 2009-03-24 19:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kjetil Barvik, git

On a 32-bit system, the maximum possible size for an object is less than 
4GB, while 64-bit systems may cope with larger objects.  Due to this 
limitation, variables holding object sizes are using an unsigned long 
type (32 bits on 32-bit systems, or 64 bits on 64-bit systems).

When large objects are encountered, and/or people play with large delta 
depth values, it is possible for the maximum allowed delta size 
computation to overflow, especially on a 32-bit system.  When this 
occurs, surviving result bits may represent a value much smaller than 
what it is supposed to be, or even zero.  This prevents some objects 
from being deltified although they do get deltified when a smaller depth 
limit is used.  Fix this by always performing a 64-bit multiplication.

Signed-off-by: Nicolas Pitre <nico@cam.org>

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 3a4bdbb..9fc3b35 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -1293,7 +1293,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
 		max_size = trg_entry->delta_size;
 		ref_depth = trg->depth;
 	}
-	max_size = max_size * (max_depth - src->depth) /
+	max_size = (uint64_t)max_size * (max_depth - src->depth) /
 						(max_depth - ref_depth + 1);
 	if (max_size == 0)
 		return 0;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] avoid possible overflow in delta size filtering computation
  2009-03-24 19:56 [PATCH] avoid possible overflow in delta size filtering computation Nicolas Pitre
@ 2009-03-24 20:20 ` Brandon Casey
  2009-03-24 20:52   ` Nicolas Pitre
  2009-03-25  0:39   ` Nicolas Pitre
  2009-03-25 12:15 ` Kjetil Barvik
  1 sibling, 2 replies; 10+ messages in thread
From: Brandon Casey @ 2009-03-24 20:20 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, Kjetil Barvik, git

Nicolas Pitre wrote:
> On a 32-bit system, the maximum possible size for an object is less than 
> 4GB, while 64-bit systems may cope with larger objects.  Due to this 
> limitation, variables holding object sizes are using an unsigned long 
> type (32 bits on 32-bit systems, or 64 bits on 64-bit systems).

FYI: except on windows 64-bit where long is still 32 bits AFAIK

-brandon

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] avoid possible overflow in delta size filtering computation
  2009-03-24 20:20 ` Brandon Casey
@ 2009-03-24 20:52   ` Nicolas Pitre
  2009-03-25  0:39   ` Nicolas Pitre
  1 sibling, 0 replies; 10+ messages in thread
From: Nicolas Pitre @ 2009-03-24 20:52 UTC (permalink / raw)
  To: Brandon Casey; +Cc: Junio C Hamano, Kjetil Barvik, git

On Tue, 24 Mar 2009, Brandon Casey wrote:

> Nicolas Pitre wrote:
> > On a 32-bit system, the maximum possible size for an object is less than 
> > 4GB, while 64-bit systems may cope with larger objects.  Due to this 
> > limitation, variables holding object sizes are using an unsigned long 
> > type (32 bits on 32-bit systems, or 64 bits on 64-bit systems).
> 
> FYI: except on windows 64-bit where long is still 32 bits AFAIK

Whatever. Same issue.


Nicolas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] avoid possible overflow in delta size filtering computation
  2009-03-24 20:20 ` Brandon Casey
  2009-03-24 20:52   ` Nicolas Pitre
@ 2009-03-25  0:39   ` Nicolas Pitre
  1 sibling, 0 replies; 10+ messages in thread
From: Nicolas Pitre @ 2009-03-25  0:39 UTC (permalink / raw)
  To: Brandon Casey; +Cc: Junio C Hamano, Kjetil Barvik, git

On Tue, 24 Mar 2009, Brandon Casey wrote:

> Nicolas Pitre wrote:
> > On a 32-bit system, the maximum possible size for an object is less than 
> > 4GB, while 64-bit systems may cope with larger objects.  Due to this 
> > limitation, variables holding object sizes are using an unsigned long 
> > type (32 bits on 32-bit systems, or 64 bits on 64-bit systems).
> 
> FYI: except on windows 64-bit where long is still 32 bits AFAIK

Whatever. Same issue.


Nicolas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] avoid possible overflow in delta size filtering computation
  2009-03-24 19:56 [PATCH] avoid possible overflow in delta size filtering computation Nicolas Pitre
  2009-03-24 20:20 ` Brandon Casey
@ 2009-03-25 12:15 ` Kjetil Barvik
  2009-03-25 16:18   ` Nicolas Pitre
  1 sibling, 1 reply; 10+ messages in thread
From: Kjetil Barvik @ 2009-03-25 12:15 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git

Nicolas Pitre <nico@cam.org> writes:

> On a 32-bit system, the maximum possible size for an object is less than 
> 4GB, while 64-bit systems may cope with larger objects.  Due to this 
> limitation, variables holding object sizes are using an unsigned long 
> type (32 bits on 32-bit systems, or 64 bits on 64-bit systems).
>
> When large objects are encountered, and/or people play with large delta 
> depth values, it is possible for the maximum allowed delta size 
> computation to overflow, especially on a 32-bit system.  When this 
> occurs, surviving result bits may represent a value much smaller than 
> what it is supposed to be, or even zero.  This prevents some objects 
> from being deltified although they do get deltified when a smaller depth 
> limit is used.  Fix this by always performing a 64-bit multiplication.
>
> Signed-off-by: Nicolas Pitre <nico@cam.org>

  I added this patch and rerun the 2 test cases form the table where
  --depth is 20000 and 95000, and got the following result:

    --depth=20000 => file size: 19126077  delta: 73814
    --depth=95000 => file size: 19126087  delta: 73814

  So, it seems that this patch almost fixed the issue.  But notice that
  the pack file was 10 bytes larger for the --depth=95000 case.

  I made a small perl script to compare the output from 'git verify-pack
  -v' of the 2 idx/pack files, and found the following difference(1)
  (first line from --depth=20000 case, second from --depth=95000):

  fe0a6f3e971373590714dbafd087b235ea60ac00  tree   9  19  18921247  731  96a3ec5789504e6d0f90c99fb1937af1ebd58e2d
  fe0a6f3e971373590714dbafd087b235ea60ac00  tree  20  29  18921247  730  12e560f7fb28558b15e3a2008fba860f9a4b2222

  'git show fe0a6f3e971373590714dbafd087b235ea60ac00' =>

tree fe0a6f3e971373590714dbafd087b235ea60ac00

Makefile
t0000-basic.sh
test-lib.sh

  'git show 96a3ec5789504e6d0f90c99fb1937af1ebd58e2d' =>

tree 96a3ec5789504e6d0f90c99fb1937af1ebd58e2d

Makefile
t0000-basic.sh
t0100-environment-names.sh
t0200-update-cache.sh
t0400-ls-files.sh
t0500-ls-files.sh
t1000-checkout-cache.sh
t1001-checkout-cache.sh
test-lib.sh

  'git show 12e560f7fb28558b15e3a2008fba860f9a4b2222' =>

tree 12e560f7fb28558b15e3a2008fba860f9a4b2222

Makefile
t0000-basic.sh
t0100-environment-names.sh
t0200-update-cache.sh
t0400-ls-files.sh
t0500-ls-files.sh
t1000-checkout-cache.sh
t1001-checkout-cache.sh
test-lib.sh

  -- kjetil

  1) there was lots of lines with different offsets, all of which was 10
     larger in the --depth=95000 case.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] avoid possible overflow in delta size filtering computation
  2009-03-25 12:15 ` Kjetil Barvik
@ 2009-03-25 16:18   ` Nicolas Pitre
  2009-03-25 16:34     ` Kjetil Barvik
  0 siblings, 1 reply; 10+ messages in thread
From: Nicolas Pitre @ 2009-03-25 16:18 UTC (permalink / raw)
  To: Kjetil Barvik; +Cc: Junio C Hamano, git

On Wed, 25 Mar 2009, Kjetil Barvik wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> > On a 32-bit system, the maximum possible size for an object is less than 
> > 4GB, while 64-bit systems may cope with larger objects.  Due to this 
> > limitation, variables holding object sizes are using an unsigned long 
> > type (32 bits on 32-bit systems, or 64 bits on 64-bit systems).
> >
> > When large objects are encountered, and/or people play with large delta 
> > depth values, it is possible for the maximum allowed delta size 
> > computation to overflow, especially on a 32-bit system.  When this 
> > occurs, surviving result bits may represent a value much smaller than 
> > what it is supposed to be, or even zero.  This prevents some objects 
> > from being deltified although they do get deltified when a smaller depth 
> > limit is used.  Fix this by always performing a 64-bit multiplication.
> >
> > Signed-off-by: Nicolas Pitre <nico@cam.org>
> 
>   I added this patch and rerun the 2 test cases form the table where
>   --depth is 20000 and 95000, and got the following result:
> 
>     --depth=20000 => file size: 19126077  delta: 73814
>     --depth=95000 => file size: 19126087  delta: 73814
> 
>   So, it seems that this patch almost fixed the issue.  But notice that
>   the pack file was 10 bytes larger for the --depth=95000 case.
> 
>   I made a small perl script to compare the output from 'git verify-pack
>   -v' of the 2 idx/pack files, and found the following difference(1)
>   (first line from --depth=20000 case, second from --depth=95000):
> 
>   fe0a6f3e971373590714dbafd087b235ea60ac00  tree   9  19  18921247  731  96a3ec5789504e6d0f90c99fb1937af1ebd58e2d
>   fe0a6f3e971373590714dbafd087b235ea60ac00  tree  20  29  18921247  730  12e560f7fb28558b15e3a2008fba860f9a4b2222

OK.  Apparently, a different base object for that one delta was chosen 
between those two runs.

Is your machine SMP?


Nicolas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] avoid possible overflow in delta size filtering computation
  2009-03-25 16:18   ` Nicolas Pitre
@ 2009-03-25 16:34     ` Kjetil Barvik
  2009-03-25 19:17       ` Nicolas Pitre
  0 siblings, 1 reply; 10+ messages in thread
From: Kjetil Barvik @ 2009-03-25 16:34 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git

Nicolas Pitre <nico@cam.org> writes:

> On Wed, 25 Mar 2009, Kjetil Barvik wrote:
>
>> Nicolas Pitre <nico@cam.org> writes:
>> 
>> > On a 32-bit system, the maximum possible size for an object is less than 
>> > 4GB, while 64-bit systems may cope with larger objects.  Due to this 
>> > limitation, variables holding object sizes are using an unsigned long 
>> > type (32 bits on 32-bit systems, or 64 bits on 64-bit systems).
>> >
>> > When large objects are encountered, and/or people play with large delta 
>> > depth values, it is possible for the maximum allowed delta size 
>> > computation to overflow, especially on a 32-bit system.  When this 
>> > occurs, surviving result bits may represent a value much smaller than 
>> > what it is supposed to be, or even zero.  This prevents some objects 
>> > from being deltified although they do get deltified when a smaller depth 
>> > limit is used.  Fix this by always performing a 64-bit multiplication.
>> >
>> > Signed-off-by: Nicolas Pitre <nico@cam.org>
>> 
>>   I added this patch and rerun the 2 test cases form the table where
>>   --depth is 20000 and 95000, and got the following result:
>> 
>>     --depth=20000 => file size: 19126077  delta: 73814
>>     --depth=95000 => file size: 19126087  delta: 73814
>> 
>>   So, it seems that this patch almost fixed the issue.  But notice that
>>   the pack file was 10 bytes larger for the --depth=95000 case.
>> 
>>   I made a small perl script to compare the output from 'git verify-pack
>>   -v' of the 2 idx/pack files, and found the following difference(1)
>>   (first line from --depth=20000 case, second from --depth=95000):
>> 
>>   fe0a6f3e971373590714dbafd087b235ea60ac00  tree   9  19  18921247  731  96a3ec5789504e6d0f90c99fb1937af1ebd58e2d
>>   fe0a6f3e971373590714dbafd087b235ea60ac00  tree  20  29  18921247  730  12e560f7fb28558b15e3a2008fba860f9a4b2222
>
> OK.  Apparently, a different base object for that one delta was chosen 
> between those two runs.
>
> Is your machine SMP?

  kjetil ~$ uname -a
  Linux localhost 2.6.28.4 #26 SMP PREEMPT Tue Feb 10 17:07:14 CET 2009
  i686 Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz GenuineIntel GNU/Linux

  -- kjetil

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] avoid possible overflow in delta size filtering computation
  2009-03-25 16:34     ` Kjetil Barvik
@ 2009-03-25 19:17       ` Nicolas Pitre
  2009-03-26  7:18         ` Kjetil Barvik
  0 siblings, 1 reply; 10+ messages in thread
From: Nicolas Pitre @ 2009-03-25 19:17 UTC (permalink / raw)
  To: Kjetil Barvik; +Cc: Junio C Hamano, git

On Wed, 25 Mar 2009, Kjetil Barvik wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> > On Wed, 25 Mar 2009, Kjetil Barvik wrote:
> >
> >>   So, it seems that this patch almost fixed the issue.  But notice that
> >>   the pack file was 10 bytes larger for the --depth=95000 case.
> >> 
> >>   I made a small perl script to compare the output from 'git verify-pack
> >>   -v' of the 2 idx/pack files, and found the following difference(1)
> >>   (first line from --depth=20000 case, second from --depth=95000):
> >> 
> >>   fe0a6f3e971373590714dbafd087b235ea60ac00  tree   9  19  18921247  731  96a3ec5789504e6d0f90c99fb1937af1ebd58e2d
> >>   fe0a6f3e971373590714dbafd087b235ea60ac00  tree  20  29  18921247  730  12e560f7fb28558b15e3a2008fba860f9a4b2222
> >
> > OK.  Apparently, a different base object for that one delta was chosen 
> > between those two runs.
> >
> > Is your machine SMP?
> 
>   kjetil ~$ uname -a
>   Linux localhost 2.6.28.4 #26 SMP PREEMPT Tue Feb 10 17:07:14 CET 2009
>   i686 Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz GenuineIntel GNU/Linux

Here you go.  If you want a perfectly deterministic repacking, you'll 
have to force the pack.threads config option to 1.


Nicolas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] avoid possible overflow in delta size filtering computation
  2009-03-25 19:17       ` Nicolas Pitre
@ 2009-03-26  7:18         ` Kjetil Barvik
  2009-03-27  2:23           ` Nicolas Pitre
  0 siblings, 1 reply; 10+ messages in thread
From: Kjetil Barvik @ 2009-03-26  7:18 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git

Nicolas Pitre <nico@cam.org> writes:

> On Wed, 25 Mar 2009, Kjetil Barvik wrote:
>
>> Nicolas Pitre <nico@cam.org> writes:
>> 
>> > On Wed, 25 Mar 2009, Kjetil Barvik wrote:
>> >
>> >>   So, it seems that this patch almost fixed the issue.  But notice that
>> >>   the pack file was 10 bytes larger for the --depth=95000 case.
>> >> 
>> >>   I made a small perl script to compare the output from 'git verify-pack
>> >>   -v' of the 2 idx/pack files, and found the following difference(1)
>> >>   (first line from --depth=20000 case, second from --depth=95000):
>> >> 
>> >>   fe0a6f3e971373590714dbafd087b235ea60ac00  tree   9  19  18921247  731  96a3ec5789504e6d0f90c99fb1937af1ebd58e2d
>> >>   fe0a6f3e971373590714dbafd087b235ea60ac00  tree  20  29  18921247  730  12e560f7fb28558b15e3a2008fba860f9a4b2222
>> >
>> > OK.  Apparently, a different base object for that one delta was chosen 
>> > between those two runs.
>> >
>> > Is your machine SMP?
>> 
>>   kjetil ~$ uname -a
>>   Linux localhost 2.6.28.4 #26 SMP PREEMPT Tue Feb 10 17:07:14 CET 2009
>>   i686 Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz GenuineIntel GNU/Linux
>
> Here you go.  If you want a perfectly deterministic repacking, you'll 
> have to force the pack.threads config option to 1.

  OK, have rerun the test again with pack.config set to 1, and got the
  exact same result(1) as above this time also!  :-)

  And I think I can explain the reason for the same result: When the
  --window and/or --depth value(s) is larger than (half?) the value of
  possible number of objects (98438 in this time), I think that the
  thread logic finds out that it can only run one thread (there is not
  room/objects enough for 2 or more threads).

  This also explains what I see when I run the repack command (without
  the pack.config option), only 1 git process is running on 1 of the
  CPU's from the start, and the other is idle.

  ~~

  Give me a hint if you want some debug info from the 2 pack/idx files.

  -- kjetil

  1) The output from the 2 'git repack' commands did not show the
     "running delta with 2 threads" (or something similar) this time.
     I guess this is the sign that "pack.threads = 1" is working.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] avoid possible overflow in delta size filtering computation
  2009-03-26  7:18         ` Kjetil Barvik
@ 2009-03-27  2:23           ` Nicolas Pitre
  0 siblings, 0 replies; 10+ messages in thread
From: Nicolas Pitre @ 2009-03-27  2:23 UTC (permalink / raw)
  To: Kjetil Barvik; +Cc: Junio C Hamano, git

On Thu, 26 Mar 2009, Kjetil Barvik wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> > Here you go.  If you want a perfectly deterministic repacking, you'll 
> > have to force the pack.threads config option to 1.
> 
>   OK, have rerun the test again with pack.config set to 1, and got the
>   exact same result(1) as above this time also!  :-)

Good.

>   Give me a hint if you want some debug info from the 2 pack/idx files.

I think there is nothing else to debug.  Having some differences between 
successive threaded repacks is expected.


Nicolas

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-03-27  2:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-24 19:56 [PATCH] avoid possible overflow in delta size filtering computation Nicolas Pitre
2009-03-24 20:20 ` Brandon Casey
2009-03-24 20:52   ` Nicolas Pitre
2009-03-25  0:39   ` Nicolas Pitre
2009-03-25 12:15 ` Kjetil Barvik
2009-03-25 16:18   ` Nicolas Pitre
2009-03-25 16:34     ` Kjetil Barvik
2009-03-25 19:17       ` Nicolas Pitre
2009-03-26  7:18         ` Kjetil Barvik
2009-03-27  2:23           ` Nicolas Pitre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).