git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Jonathan Blanton <jonathan.blanton@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: malloc fails when dealing with huge files
Date: Wed, 10 Dec 2008 11:32:32 -0800 (PST)	[thread overview]
Message-ID: <alpine.LFD.2.00.0812101121401.3340@localhost.localdomain> (raw)
In-Reply-To: <43c10b980812100742t3a65466yb9b7310bfedb2b18@mail.gmail.com>



On Wed, 10 Dec 2008, Jonathan Blanton wrote:
>
> I'm using Git for a project that contains huge (multi-gigabyte) files.
>  I need to track these files, but with some of the really big ones,
> git-add aborts with the message "fatal: Out of memory, malloc failed".

git is _really_ not designed for huge files.

By design - good or bad - git does pretty much all single file operations 
with the whole file in memory as one single allocation. 

Now, some of that is hard to fix - or at least would generate much more 
complex code. The _particular_ case of "git add" could be fixed without 
undue pain, but it's not entirely trivial either.

The main offender is probably "index_fd()" that just mmap's the whole file 
in one go and then calls write_sha1_file() which really expects it to be 
one single memory area both for the initial SHA1 create and for the 
compression and writing out of the result.

Changing that to do big files in pieces would not be _too_ painful, but 
it's not just a couple of lines either.

However, git performance with big files would never be wonderful, and 
things like "git diff" would still end up reading not just the whole file, 
but _both_versions_ at the same time. Marking the big files as being 
no-diff might help, though.


			Linus

  reply	other threads:[~2008-12-10 19:34 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-10 15:42 malloc fails when dealing with huge files Jonathan Blanton
2008-12-10 19:32 ` Linus Torvalds [this message]
2008-12-11  0:16   ` Jeff Whiteside
2008-12-11  9:11   ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.0812101121401.3340@localhost.localdomain \
    --to=torvalds@linux-foundation.org \
    --cc=git@vger.kernel.org \
    --cc=jonathan.blanton@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).