git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Jeff Whiteside" <jeff.m.whiteside@gmail.com>
To: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Jonathan Blanton" <jonathan.blanton@gmail.com>, git@vger.kernel.org
Subject: Re: malloc fails when dealing with huge files
Date: Wed, 10 Dec 2008 16:16:17 -0800	[thread overview]
Message-ID: <3ab397d0812101616t770e2a8dj2150cc630946917@mail.gmail.com> (raw)
In-Reply-To: <alpine.LFD.2.00.0812101121401.3340@localhost.localdomain>

i tried to do something like that over a year ago, having gotten the
insane idea that i wanted to version my whole harddrive.  binaries
were a huge problem.

checkouts were also a problem over slow connections because there is
no git-clone --resume, so if your connection is interrupted, you're
back at square one.  perhaps git-torrent will fix that.

git wasn't supposed to be file based, as much as line/code based.  let
me know if you find a better alternative to git for filesystems.

it's too bad there's not a better way to keep resources tagged to a
version by a sha1, but keep source separate.

On Wed, Dec 10, 2008 at 11:32 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Wed, 10 Dec 2008, Jonathan Blanton wrote:
>>
>> I'm using Git for a project that contains huge (multi-gigabyte) files.
>>  I need to track these files, but with some of the really big ones,
>> git-add aborts with the message "fatal: Out of memory, malloc failed".
>
> git is _really_ not designed for huge files.
>
> By design - good or bad - git does pretty much all single file operations
> with the whole file in memory as one single allocation.
>
> Now, some of that is hard to fix - or at least would generate much more
> complex code. The _particular_ case of "git add" could be fixed without
> undue pain, but it's not entirely trivial either.
>
> The main offender is probably "index_fd()" that just mmap's the whole file
> in one go and then calls write_sha1_file() which really expects it to be
> one single memory area both for the initial SHA1 create and for the
> compression and writing out of the result.
>
> Changing that to do big files in pieces would not be _too_ painful, but
> it's not just a couple of lines either.
>
> However, git performance with big files would never be wonderful, and
> things like "git diff" would still end up reading not just the whole file,
> but _both_versions_ at the same time. Marking the big files as being
> no-diff might help, though.
>
>
>                        Linus
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

  reply	other threads:[~2008-12-11  0:17 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-10 15:42 malloc fails when dealing with huge files Jonathan Blanton
2008-12-10 19:32 ` Linus Torvalds
2008-12-11  0:16   ` Jeff Whiteside [this message]
2008-12-11  9:11   ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3ab397d0812101616t770e2a8dj2150cc630946917@mail.gmail.com \
    --to=jeff.m.whiteside@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathan.blanton@gmail.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).