* Re: malloc fails when dealing with huge files
2008-12-10 19:32 ` Linus Torvalds
@ 2008-12-11 0:16 ` Jeff Whiteside
2008-12-11 9:11 ` Johannes Schindelin
1 sibling, 0 replies; 4+ messages in thread
From: Jeff Whiteside @ 2008-12-11 0:16 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Jonathan Blanton, git
i tried to do something like that over a year ago, having gotten the
insane idea that i wanted to version my whole harddrive. binaries
were a huge problem.
checkouts were also a problem over slow connections because there is
no git-clone --resume, so if your connection is interrupted, you're
back at square one. perhaps git-torrent will fix that.
git wasn't supposed to be file based, as much as line/code based. let
me know if you find a better alternative to git for filesystems.
it's too bad there's not a better way to keep resources tagged to a
version by a sha1, but keep source separate.
On Wed, Dec 10, 2008 at 11:32 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Wed, 10 Dec 2008, Jonathan Blanton wrote:
>>
>> I'm using Git for a project that contains huge (multi-gigabyte) files.
>> I need to track these files, but with some of the really big ones,
>> git-add aborts with the message "fatal: Out of memory, malloc failed".
>
> git is _really_ not designed for huge files.
>
> By design - good or bad - git does pretty much all single file operations
> with the whole file in memory as one single allocation.
>
> Now, some of that is hard to fix - or at least would generate much more
> complex code. The _particular_ case of "git add" could be fixed without
> undue pain, but it's not entirely trivial either.
>
> The main offender is probably "index_fd()" that just mmap's the whole file
> in one go and then calls write_sha1_file() which really expects it to be
> one single memory area both for the initial SHA1 create and for the
> compression and writing out of the result.
>
> Changing that to do big files in pieces would not be _too_ painful, but
> it's not just a couple of lines either.
>
> However, git performance with big files would never be wonderful, and
> things like "git diff" would still end up reading not just the whole file,
> but _both_versions_ at the same time. Marking the big files as being
> no-diff might help, though.
>
>
> Linus
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: malloc fails when dealing with huge files
2008-12-10 19:32 ` Linus Torvalds
2008-12-11 0:16 ` Jeff Whiteside
@ 2008-12-11 9:11 ` Johannes Schindelin
1 sibling, 0 replies; 4+ messages in thread
From: Johannes Schindelin @ 2008-12-11 9:11 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Jonathan Blanton, git
Hi,
On Wed, 10 Dec 2008, Linus Torvalds wrote:
> However, git performance with big files would never be wonderful, and
> things like "git diff" would still end up reading not just the whole
> file, but _both_versions_ at the same time. Marking the big files as
> being no-diff might help, though.
Makes me wonder if we should not have a default cut-off, say, 10MB, at
which files are automatically tagged with the no-diff attribute (unless
overridden explicitely in .gitattributes)?
Ciao,
Dscho
^ permalink raw reply [flat|nested] 4+ messages in thread