Re: large files and low memory

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jonathan Nieder <jrnieder@gmail.com>
To: Nicolas Pitre <nico@fluxnic.net>
Cc: Shawn Pearce <spearce@spearce.org>,
	weigelt@metux.de, git@vger.kernel.org
Subject: Re: large files and low memory
Date: Tue, 5 Oct 2010 15:34:50 -0500	[thread overview]
Message-ID: <20101005203450.GA2096@burratino> (raw)
In-Reply-To: <alpine.LFD.2.00.1010051518570.3107@xanadu.home>

Nicolas Pitre wrote:

> You can't do a one-pass  calculation.  The first one is required to 
> compute the SHA1 of the file being added, and if that corresponds to an 
> object that we already have then the operation stops right there as 
> there is actually nothing to do.

Ah.  Thanks for a reminder.

> In the case of big files, what we need to do is to stream the file data 
> in, compute the SHA1 and deflate it, in order to stream it out into a 
> temporary file, then rename it according to the final SHA1.  This would 
> allow Git to work with big files, but of course it won't be possible to 
> know if the object corresponding to the file is already known until all 
> the work has been done, possibly just to throw it away.

To make sure I understand correctly: are you suggesting that for big
files we should skip the first pass?

I suppose that makes sense: for small files, using a patch application
tool to reach a postimage that matches an existing object is something
git historically needed to expect, but for typical big files:

 - once you've computed the SHA1, you've already invested a noticeable
   amount of time.
 - emailing patches around is difficult, making "git am" etc less important
 - hopefully git or zlib can notice when files are uncompressible,
   making the deflate not cost so much in that case.

next prev parent reply	other threads:[~2010-10-05 20:38 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-04  9:20 large files and low memory Enrico Weigelt
2010-10-04 18:05 ` Shawn Pearce
2010-10-04 18:24   ` Joshua Jensen
2010-10-04 18:57     ` Shawn Pearce
2010-10-05  0:59       ` Enrico Weigelt
2010-10-05  7:41         ` Enrico Weigelt
2010-10-05  8:01           ` Matthieu Moy
2010-10-05  8:17             ` Enrico Weigelt
2010-10-05 11:29               ` Alex Riesen
2010-10-05 11:38                 ` Matthieu Moy
2010-10-05 11:55                   ` Nguyen Thai Ngoc Duy
2010-10-05 16:42                     ` Junio C Hamano
2010-10-05 10:13           ` Nguyen Thai Ngoc Duy
2010-10-05 19:12             ` Nicolas Pitre
2010-10-04 18:58   ` Jonathan Nieder
2010-10-04 19:11     ` Shawn Pearce
2010-10-04 19:16       ` Jonathan Nieder
2010-10-05 10:59         ` Nguyen Thai Ngoc Duy
2010-10-05 20:17         ` Nicolas Pitre
2010-10-05 20:34           ` Jonathan Nieder [this message]
2010-10-05 21:11             ` Nicolas Pitre
2010-10-05  0:57     ` Enrico Weigelt
2010-10-05  1:07       ` Ævar Arnfjörð Bjarmason
2010-10-05  1:10       ` Jonathan Nieder
2010-10-05  7:35         ` Enrico Weigelt
2010-10-05 13:47           ` Jonathan Nieder
2010-10-05  0:50   ` Enrico Weigelt
2010-10-05 19:06     ` Nicolas Pitre
2010-10-05 22:51       ` Enrico Weigelt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101005203450.GA2096@burratino \
    --to=jrnieder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=nico@fluxnic.net \
    --cc=spearce@spearce.org \
    --cc=weigelt@metux.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).