git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* "malloc failed"
@ 2009-01-27 15:04 David Abrahams
  2009-01-27 15:29 ` Shawn O. Pearce
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: David Abrahams @ 2009-01-27 15:04 UTC (permalink / raw)
  To: git


I've been abusing Git for a purpose it wasn't intended to serve:
archiving a large number of files with many duplicates and
near-duplicates.  Every once in a while, when trying to do something
really big, it tells me "malloc failed" and bails out (I think it's
during "git add" but because of the way I issued the commands I can't
tell: it could have been a commit or a gc).  This is on a 64-bit linux
machine with 8G of ram and plenty of swap space, so I'm surprised.

Git is doing an amazing job at archiving and compressing all this stuff
I'm putting in it, but I have to do it a wee bit at a time or it craps
out.  Bug?

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "malloc failed"
  2009-01-27 15:04 "malloc failed" David Abrahams
@ 2009-01-27 15:29 ` Shawn O. Pearce
  2009-01-27 15:32   ` David Abrahams
  2009-01-27 18:02 ` Johannes Schindelin
  2009-01-28  5:02 ` Jeff King
  2 siblings, 1 reply; 15+ messages in thread
From: Shawn O. Pearce @ 2009-01-27 15:29 UTC (permalink / raw)
  To: David Abrahams; +Cc: git

David Abrahams <dave@boostpro.com> wrote:
> I've been abusing Git for a purpose it wasn't intended to serve:
> archiving a large number of files with many duplicates and
> near-duplicates.  Every once in a while, when trying to do something
> really big, it tells me "malloc failed" and bails out (I think it's
> during "git add" but because of the way I issued the commands I can't
> tell: it could have been a commit or a gc).  This is on a 64-bit linux
> machine with 8G of ram and plenty of swap space, so I'm surprised.
> 
> Git is doing an amazing job at archiving and compressing all this stuff
> I'm putting in it, but I have to do it a wee bit at a time or it craps
> out.  Bug?

No, not really.  Above you said you are "abusing git for a purpose
it wasn't intended to serve"...

Git was never designed to handle many large binary blobs of data.
It was mostly designed for source code, where the majority of the
data stored in it is some form of text file written by a human.

By their very nature these files need to be relatively short (e.g.
under 1 MB each) as no human can sanely maintain a text file that
large without breaking it apart into different smaller files (like
the source code for an operating system kernel).

As a result of this approach, the git code assumes it can malloc()
at least two blocks large enough for each file: one of the fully
decompressed content, and another for the fully compressed content.
Try doing git add on a large file and its very likely malloc
will fail due to ulimit issues, or you just don't have enough
memory/address space to go around.

git gc likewise needs a good chunk of memory, but it shouldn't
usually report "malloc failed".  Usually in git gc if a malloc fails
it prints a warning and degrades the quality of its data compression.
But there are critical bookkeeping data structures where we must be
able to malloc the memory, and if those fail because we've already
exhausted the heap early on, then yea, it can fail too.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "malloc failed"
  2009-01-27 15:29 ` Shawn O. Pearce
@ 2009-01-27 15:32   ` David Abrahams
  0 siblings, 0 replies; 15+ messages in thread
From: David Abrahams @ 2009-01-27 15:32 UTC (permalink / raw)
  To: git


on Tue Jan 27 2009, "Shawn O. Pearce" <spearce-AT-spearce.org> wrote:

> David Abrahams <dave@boostpro.com> wrote:
>> I've been abusing Git for a purpose it wasn't intended to serve:
>> archiving a large number of files with many duplicates and
>> near-duplicates.  Every once in a while, when trying to do something
>> really big, it tells me "malloc failed" and bails out (I think it's
>> during "git add" but because of the way I issued the commands I can't
>> tell: it could have been a commit or a gc).  This is on a 64-bit linux
>> machine with 8G of ram and plenty of swap space, so I'm surprised.
>> 
>> Git is doing an amazing job at archiving and compressing all this stuff
>> I'm putting in it, but I have to do it a wee bit at a time or it craps
>> out.  Bug?
>
> No, not really.  Above you said you are "abusing git for a purpose
> it wasn't intended to serve"...

Absolutely; I want to be upfront about that :-)

> Git was never designed to handle many large binary blobs of data.

They're largely text blobs, although there definitely are a fair share
of binaries.

> It was mostly designed for source code, where the majority of the
> data stored in it is some form of text file written by a human.
>
> By their very nature these files need to be relatively short (e.g.
> under 1 MB each) as no human can sanely maintain a text file that
> large without breaking it apart into different smaller files (like
> the source code for an operating system kernel).
>
> As a result of this approach, the git code assumes it can malloc()
> at least two blocks large enough for each file: one of the fully
> decompressed content, and another for the fully compressed content.
> Try doing git add on a large file and its very likely malloc
> will fail due to ulimit issues, or you just don't have enough
> memory/address space to go around.

Oh, so maybe I'm getting hit by ulimit; I didn't think of that.  I could
raise my ulimit to try to get around this.

> git gc likewise needs a good chunk of memory, but it shouldn't
> usually report "malloc failed".  Usually in git gc if a malloc fails
> it prints a warning and degrades the quality of its data compression.
> But there are critical bookkeeping data structures where we must be
> able to malloc the memory, and if those fail because we've already
> exhausted the heap early on, then yea, it can fail too.

Thanks much for that, and for reminding me about ulimit.

Cheers,

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "malloc failed"
  2009-01-27 15:04 "malloc failed" David Abrahams
  2009-01-27 15:29 ` Shawn O. Pearce
@ 2009-01-27 18:02 ` Johannes Schindelin
  2009-01-28  5:02 ` Jeff King
  2 siblings, 0 replies; 15+ messages in thread
From: Johannes Schindelin @ 2009-01-27 18:02 UTC (permalink / raw)
  To: David Abrahams; +Cc: git

Hi,

On Tue, 27 Jan 2009, David Abrahams wrote:

> I've been abusing Git for a purpose it wasn't intended to serve: 
> archiving a large number of files with many duplicates and 
> near-duplicates.

Hah!  My first UGFWIINI contender!  Unfortunately, I listed that purpose 
explicitely already...

> Every once in a while, when trying to do something really big, it tells 
> me "malloc failed" and bails out (I think it's during "git add" but 
> because of the way I issued the commands I can't tell: it could have 
> been a commit or a gc).  This is on a 64-bit linux machine with 8G of 
> ram and plenty of swap space, so I'm surprised.

Yes, I am surprised, too.  I would expect that some kind of arbitrary 
user-specifiable limit hit you.  Haven't had time to look at the code, 
though.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "malloc failed"
  2009-01-27 15:04 "malloc failed" David Abrahams
  2009-01-27 15:29 ` Shawn O. Pearce
  2009-01-27 18:02 ` Johannes Schindelin
@ 2009-01-28  5:02 ` Jeff King
  2009-01-28 21:53   ` David Abrahams
  2009-01-28 22:16   ` Pau Garcia i Quiles
  2 siblings, 2 replies; 15+ messages in thread
From: Jeff King @ 2009-01-28  5:02 UTC (permalink / raw)
  To: David Abrahams; +Cc: git

On Tue, Jan 27, 2009 at 10:04:42AM -0500, David Abrahams wrote:

> I've been abusing Git for a purpose it wasn't intended to serve:
> archiving a large number of files with many duplicates and
> near-duplicates.  Every once in a while, when trying to do something
> really big, it tells me "malloc failed" and bails out (I think it's
> during "git add" but because of the way I issued the commands I can't
> tell: it could have been a commit or a gc).  This is on a 64-bit linux
> machine with 8G of ram and plenty of swap space, so I'm surprised.
> 
> Git is doing an amazing job at archiving and compressing all this stuff
> I'm putting in it, but I have to do it a wee bit at a time or it craps
> out.  Bug?

How big is the repository? How big are the biggest files? I have a
3.5G repo with files ranging from a few bytes to about 180M. I've never
run into malloc problems or gone into swap on my measly 1G box.
How does your dataset compare?

As others have mentioned, git wasn't really designed specifically for
those sorts of numbers, but in the interests of performance, I find git
is usually pretty careful about not keeping too much useless stuff in
memory at one time.  And the fact that you can perform the same
operation a little bit at a time and achieve success implies to me there
might be a leak or some silly behavior that can be fixed.

It would help a lot if we knew the operation that was causing the
problem. Can you try to isolate the failed command next time it happens?

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "malloc failed"
  2009-01-28  5:02 ` Jeff King
@ 2009-01-28 21:53   ` David Abrahams
  2009-01-29  0:06     ` David Abrahams
  2009-01-28 22:16   ` Pau Garcia i Quiles
  1 sibling, 1 reply; 15+ messages in thread
From: David Abrahams @ 2009-01-28 21:53 UTC (permalink / raw)
  To: Jeff King; +Cc: git


On Wed, 28 Jan 2009 00:02:25 -0500, Jeff King <peff@peff.net> wrote:

> On Tue, Jan 27, 2009 at 10:04:42AM -0500, David Abrahams wrote:

> 

>> I've been abusing Git for a purpose it wasn't intended to serve:

>> archiving a large number of files with many duplicates and

>> near-duplicates.  Every once in a while, when trying to do something

>> really big, it tells me "malloc failed" and bails out (I think it's

>> during "git add" but because of the way I issued the commands I can't

>> tell: it could have been a commit or a gc).  This is on a 64-bit linux

>> machine with 8G of ram and plenty of swap space, so I'm surprised.

>> 

>> Git is doing an amazing job at archiving and compressing all this stuff

>> I'm putting in it, but I have to do it a wee bit at a time or it craps

>> out.  Bug?

> 

> How big is the repository? How big are the biggest files? I have a

> 3.5G repo with files ranging from a few bytes to about 180M. I've never

> run into malloc problems or gone into swap on my measly 1G box.

> How does your dataset compare?



I'll try to do some research.  Gotta go pick up my boy now...



> As others have mentioned, git wasn't really designed specifically for

> those sorts of numbers, but in the interests of performance, I find git

> is usually pretty careful about not keeping too much useless stuff in

> memory at one time.  And the fact that you can perform the same

> operation a little bit at a time and achieve success implies to me there

> might be a leak or some silly behavior that can be fixed.

> 

> It would help a lot if we knew the operation that was causing the

> problem. Can you try to isolate the failed command next time it happens?



root@recovery:/olympic/deuce/review# ulimit -v

unlimited

root@recovery:/olympic/deuce/review# git add hydra.bak/home-dave

fatal: Out of memory, malloc failed





The process never even gets close to my total installed RAM size, much less

my whole VM space size.



-- 

David Abrahams

Boostpro Computing

http://www.boostpro.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "malloc failed"
  2009-01-28  5:02 ` Jeff King
  2009-01-28 21:53   ` David Abrahams
@ 2009-01-28 22:16   ` Pau Garcia i Quiles
  2009-01-29  5:14     ` Jeff King
  1 sibling, 1 reply; 15+ messages in thread
From: Pau Garcia i Quiles @ 2009-01-28 22:16 UTC (permalink / raw)
  To: Jeff King; +Cc: David Abrahams, git

On Wed, Jan 28, 2009 at 6:02 AM, Jeff King <peff@peff.net> wrote:

> How big is the repository? How big are the biggest files? I have a
> 3.5G repo with files ranging from a few bytes to about 180M. I've never
> run into malloc problems or gone into swap on my measly 1G box.
> How does your dataset compare?

I also have malloc problems but only on Windows, on Linux it works fine.

My case: I have a 500 MB repository with a 1GB working tree, with
binary files ranging from 100KB to 50MB and a few thousand source
files.

I have two branches ('master' and 'cmake') and the latter has suffered
a huge hierarchy reorganization.

When I merge 'master' in 'cmake', if I use the 'subtree' strategy, it
works fine. If I use any other strategy, after a couple of minutes I
receive a "malloc failed" and the tree is all messed up. As I said, on
Linux it works fine, so maybe it's a Windows-specific problem.

-- 
Pau Garcia i Quiles
http://www.elpauer.org
(Due to my workload, I may need 10 days to answer)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "malloc failed"
  2009-01-28 21:53   ` David Abrahams
@ 2009-01-29  0:06     ` David Abrahams
  2009-01-29  5:20       ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: David Abrahams @ 2009-01-29  0:06 UTC (permalink / raw)
  To: Jeff King; +Cc: git


on Wed Jan 28 2009, David Abrahams <dave-AT-boostpro.com> wrote:

> On Wed, 28 Jan 2009 00:02:25 -0500, Jeff King <peff@peff.net> wrote:
>> On Tue, Jan 27, 2009 at 10:04:42AM -0500, David Abrahams wrote:
>> 
>>> I've been abusing Git for a purpose it wasn't intended to serve:
>>> archiving a large number of files with many duplicates and
>>> near-duplicates.  Every once in a while, when trying to do something
>>> really big, it tells me "malloc failed" and bails out (I think it's
>>> during "git add" but because of the way I issued the commands I can't
>>> tell: it could have been a commit or a gc).  This is on a 64-bit linux
>>> machine with 8G of ram and plenty of swap space, so I'm surprised.
>>> 
>>> Git is doing an amazing job at archiving and compressing all this stuff
>>> I'm putting in it, but I have to do it a wee bit at a time or it craps
>>> out.  Bug?
>> 
>> How big is the repository? How big are the biggest files? I have a
>> 3.5G repo with files ranging from a few bytes to about 180M. I've never
>> run into malloc problems or gone into swap on my measly 1G box.
>> How does your dataset compare?
>
> I'll try to do some research.  Gotta go pick up my boy now...

Well, moving the 2.6G .dar backup binary out of the fileset seems to
have helped a little, not surprisingly :-P

I don't know whether anyone on this list should care about that failure
given the level of abuse I'm inflicting on Git, but keep in mind that
the system *does* have 8G of memory.  Conclude what you will from that,
I suppose!

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "malloc failed"
  2009-01-28 22:16   ` Pau Garcia i Quiles
@ 2009-01-29  5:14     ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2009-01-29  5:14 UTC (permalink / raw)
  To: Pau Garcia i Quiles; +Cc: David Abrahams, git

On Wed, Jan 28, 2009 at 11:16:32PM +0100, Pau Garcia i Quiles wrote:

> My case: I have a 500 MB repository with a 1GB working tree, with
> binary files ranging from 100KB to 50MB and a few thousand source
> files.
> 
> I have two branches ('master' and 'cmake') and the latter has suffered
> a huge hierarchy reorganization.
> 
> When I merge 'master' in 'cmake', if I use the 'subtree' strategy, it
> works fine. If I use any other strategy, after a couple of minutes I
> receive a "malloc failed" and the tree is all messed up. As I said, on
> Linux it works fine, so maybe it's a Windows-specific problem.

Hmm. It very well might be the rename detection allocating a lot of
memory to do inexact rename detection. It does try to limit the amount
of work, but based on number of files. So if you have a lot of huge
files, that might be fooling it.

Try setting merge.renamelimit to something small (but not '0', which
means "no limit").

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "malloc failed"
  2009-01-29  0:06     ` David Abrahams
@ 2009-01-29  5:20       ` Jeff King
  2009-01-29  5:56         ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2009-01-29  5:20 UTC (permalink / raw)
  To: David Abrahams; +Cc: git

On Wed, Jan 28, 2009 at 07:06:28PM -0500, David Abrahams wrote:

> Well, moving the 2.6G .dar backup binary out of the fileset seems to
> have helped a little, not surprisingly :-P

Ok, that _is_ big. ;) I wouldn't be surprised if there is some corner of
the code that barfs on a single object that doesn't fit in a signed
32-bit integer; I don't think we have any test coverage for stuff that
big.

But it may also just be that we are going to try malloc'ing 2.6G, and
that's making some system limit unhappy.

> I don't know whether anyone on this list should care about that failure
> given the level of abuse I'm inflicting on Git, but keep in mind that
> the system *does* have 8G of memory.  Conclude what you will from that,
> I suppose!

Well, I think you said before that you were never getting close to using
up all your memory. Which implies it's some system limit.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "malloc failed"
  2009-01-29  5:20       ` Jeff King
@ 2009-01-29  5:56         ` Jeff King
  2009-01-29  7:53           ` Junio C Hamano
  2009-01-29 13:10           ` David Abrahams
  0 siblings, 2 replies; 15+ messages in thread
From: Jeff King @ 2009-01-29  5:56 UTC (permalink / raw)
  To: David Abrahams; +Cc: Junio C Hamano, git

On Thu, Jan 29, 2009 at 12:20:41AM -0500, Jeff King wrote:

> Ok, that _is_ big. ;) I wouldn't be surprised if there is some corner of
> the code that barfs on a single object that doesn't fit in a signed
> 32-bit integer; I don't think we have any test coverage for stuff that
> big.

Sure enough, that is the problem. With the patch below I was able to
"git add" and commit a 3 gigabyte file of random bytes (so even the
deflated object was 3G).

I think it might be worth applying as a general cleanup, but I have no
idea if other parts of the system might barf on such an object.

-- >8 --
Subject: [PATCH] avoid 31-bit truncation in write_loose_object

The size of the content we are adding may be larger than
2.1G (i.e., "git add gigantic-file"). Most of the code-path
to do so uses size_t or unsigned long to record the size,
but write_loose_object uses a signed int.

On platforms where "int" is 32-bits (which includes x86_64
Linux platforms), we end up passing malloc a negative size.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_file.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 360f7e5..8868b80 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2340,7 +2340,8 @@ static int create_tmpfile(char *buffer, size_t bufsiz, const char *filename)
 static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 			      void *buf, unsigned long len, time_t mtime)
 {
-	int fd, size, ret;
+	int fd, ret;
+	size_t size;
 	unsigned char *compressed;
 	z_stream stream;
 	char *filename;
-- 
1.6.1.1.259.g8712.dirty

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: "malloc failed"
  2009-01-29  5:56         ` Jeff King
@ 2009-01-29  7:53           ` Junio C Hamano
  2009-01-29 13:10           ` David Abrahams
  1 sibling, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2009-01-29  7:53 UTC (permalink / raw)
  To: Jeff King; +Cc: David Abrahams, git

Jeff King <peff@peff.net> writes:

> Subject: [PATCH] avoid 31-bit truncation in write_loose_object
>
> The size of the content we are adding may be larger than
> 2.1G (i.e., "git add gigantic-file"). Most of the code-path
> to do so uses size_t or unsigned long to record the size,
> but write_loose_object uses a signed int.

Thanks.

I wonder if some analysis tool like sparse can help us spot these...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "malloc failed"
  2009-01-29  5:56         ` Jeff King
  2009-01-29  7:53           ` Junio C Hamano
@ 2009-01-29 13:10           ` David Abrahams
  2009-01-29 13:41             ` Andreas Ericsson
  2009-01-30  4:49             ` Jeff King
  1 sibling, 2 replies; 15+ messages in thread
From: David Abrahams @ 2009-01-29 13:10 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git


on Thu Jan 29 2009, Jeff King <peff-AT-peff.net> wrote:

> On Thu, Jan 29, 2009 at 12:20:41AM -0500, Jeff King wrote:
>
>> Ok, that _is_ big. ;) I wouldn't be surprised if there is some corner of
>> the code that barfs on a single object that doesn't fit in a signed
>> 32-bit integer; I don't think we have any test coverage for stuff that
>> big.
>
> Sure enough, that is the problem. With the patch below I was able to
> "git add" and commit a 3 gigabyte file of random bytes (so even the
> deflated object was 3G).
>
> I think it might be worth applying as a general cleanup, but I have no
> idea if other parts of the system might barf on such an object.
>
> -- >8 --
> Subject: [PATCH] avoid 31-bit truncation in write_loose_object
>
> The size of the content we are adding may be larger than
> 2.1G (i.e., "git add gigantic-file"). Most of the code-path
> to do so uses size_t or unsigned long to record the size,
> but write_loose_object uses a signed int.
>
> On platforms where "int" is 32-bits (which includes x86_64
> Linux platforms), we end up passing malloc a negative size.


Good work.  I don't know if this matters to you, but I think on a 32-bit
platform you'll find that size_t, which is supposed to be able to hold
the size of the largest representable *memory block*, is only 4 bytes
large:

  #include <limits.h>
  #include <stdio.h>

  int main()
  {
    printf("sizeof(size_t) = %d", sizeof(size_t));
  }

Prints "sizeof(size_t) = 4" on my core duo.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  sha1_file.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/sha1_file.c b/sha1_file.c
> index 360f7e5..8868b80 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2340,7 +2340,8 @@ static int create_tmpfile(char *buffer, size_t bufsiz, const
> char *filename)
>  static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
>  			      void *buf, unsigned long len, time_t mtime)
>  {
> -	int fd, size, ret;
> +	int fd, ret;
> +	size_t size;
>  	unsigned char *compressed;
>  	z_stream stream;
>  	char *filename;

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "malloc failed"
  2009-01-29 13:10           ` David Abrahams
@ 2009-01-29 13:41             ` Andreas Ericsson
  2009-01-30  4:49             ` Jeff King
  1 sibling, 0 replies; 15+ messages in thread
From: Andreas Ericsson @ 2009-01-29 13:41 UTC (permalink / raw)
  To: David Abrahams; +Cc: Jeff King, Junio C Hamano, git

David Abrahams wrote:
> on Thu Jan 29 2009, Jeff King <peff-AT-peff.net> wrote:
> 
>> On Thu, Jan 29, 2009 at 12:20:41AM -0500, Jeff King wrote:
>>
>>> Ok, that _is_ big. ;) I wouldn't be surprised if there is some corner of
>>> the code that barfs on a single object that doesn't fit in a signed
>>> 32-bit integer; I don't think we have any test coverage for stuff that
>>> big.
>> Sure enough, that is the problem. With the patch below I was able to
>> "git add" and commit a 3 gigabyte file of random bytes (so even the
>> deflated object was 3G).
>>
>> I think it might be worth applying as a general cleanup, but I have no
>> idea if other parts of the system might barf on such an object.
>>
>> -- >8 --
>> Subject: [PATCH] avoid 31-bit truncation in write_loose_object
>>
>> The size of the content we are adding may be larger than
>> 2.1G (i.e., "git add gigantic-file"). Most of the code-path
>> to do so uses size_t or unsigned long to record the size,
>> but write_loose_object uses a signed int.
>>
>> On platforms where "int" is 32-bits (which includes x86_64
>> Linux platforms), we end up passing malloc a negative size.
> 
> 
> Good work.  I don't know if this matters to you, but I think on a 32-bit
> platform you'll find that size_t, which is supposed to be able to hold
> the size of the largest representable *memory block*, is only 4 bytes
> large:
> 
>   #include <limits.h>
>   #include <stdio.h>
> 
>   int main()
>   {
>     printf("sizeof(size_t) = %d", sizeof(size_t));
>   }
> 
> Prints "sizeof(size_t) = 4" on my core duo.
> 

It has nothing to do with typesize, and everything to do with
signedness. A size_t cannot be negative, while an int can.
Making sure we use the correct signedness everywhere means
we double the capacity where negative values are clearly bogus,
such as in this case. On 32-bit platforms, the upper limit for
what git can handle is now 4GB, which is expected. To go beyond
that, we'd need to rework the algorithm so we handle chunks of
the data instead of the whole. Some day, that might turn out to
be necessary but today is not that day.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "malloc failed"
  2009-01-29 13:10           ` David Abrahams
  2009-01-29 13:41             ` Andreas Ericsson
@ 2009-01-30  4:49             ` Jeff King
  1 sibling, 0 replies; 15+ messages in thread
From: Jeff King @ 2009-01-30  4:49 UTC (permalink / raw)
  To: David Abrahams; +Cc: Junio C Hamano, git

On Thu, Jan 29, 2009 at 08:10:05AM -0500, David Abrahams wrote:

> Good work.  I don't know if this matters to you, but I think on a 32-bit
> platform you'll find that size_t, which is supposed to be able to hold
> the size of the largest representable *memory block*, is only 4 bytes
> large:

That should be fine; 32-bit systems can't deal with such large files
anyway, since we want to address the whole thing. Getting around that
would, as Andreas mentioned, involve dealing with large files in chunks,
something that would make the code a lot more complex.

So I think the answer is "tough, if you want files >4G get a 64-bit
machine". Which is unreasonable for a file system to say, but I think is
fine for git.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-01-30  4:50 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-27 15:04 "malloc failed" David Abrahams
2009-01-27 15:29 ` Shawn O. Pearce
2009-01-27 15:32   ` David Abrahams
2009-01-27 18:02 ` Johannes Schindelin
2009-01-28  5:02 ` Jeff King
2009-01-28 21:53   ` David Abrahams
2009-01-29  0:06     ` David Abrahams
2009-01-29  5:20       ` Jeff King
2009-01-29  5:56         ` Jeff King
2009-01-29  7:53           ` Junio C Hamano
2009-01-29 13:10           ` David Abrahams
2009-01-29 13:41             ` Andreas Ericsson
2009-01-30  4:49             ` Jeff King
2009-01-28 22:16   ` Pau Garcia i Quiles
2009-01-29  5:14     ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).