git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@fluxnic.net>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: Shawn Pearce <spearce@spearce.org>,
	weigelt@metux.de, git@vger.kernel.org
Subject: Re: large files and low memory
Date: Tue, 05 Oct 2010 16:17:47 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LFD.2.00.1010051518570.3107@xanadu.home> (raw)
In-Reply-To: <20101004191657.GC6466@burratino>

On Mon, 4 Oct 2010, Jonathan Nieder wrote:

> Shawn Pearce wrote:
> 
> > This change only removes the deflate copy.  But due to the SHA-1
> > consistency issue I alluded to earlier, I think we're still making a
> > full copy of the file in memory before we SHA-1 it or deflate it.
> 
> Hmm, I _think_ we still use mmap for that (which is why 748af44c needs
> to compare the sha1 before and after).
> 
> But
> 
>  1) a one-pass calculation would presumably be a little (5%?) faster

You can't do a one-pass  calculation.  The first one is required to 
compute the SHA1 of the file being added, and if that corresponds to an 
object that we already have then the operation stops right there as 
there is actually nothing to do.  The second pass is to deflate the 
data, and recompute the SHA1 to make sure what we deflated and written 
out is still the same data.

In the case of big files, what we need to do is to stream the file data 
in, compute the SHA1 and deflate it, in order to stream it out into a 
temporary file, then rename it according to the final SHA1.  This would 
allow Git to work with big files, but of course it won't be possible to 
know if the object corresponding to the file is already known until all 
the work has been done, possibly just to throw it away.  But normally 
big files are the minority.


Nicolas

  parent reply	other threads:[~2010-10-05 20:17 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-04  9:20 large files and low memory Enrico Weigelt
2010-10-04 18:05 ` Shawn Pearce
2010-10-04 18:24   ` Joshua Jensen
2010-10-04 18:57     ` Shawn Pearce
2010-10-05  0:59       ` Enrico Weigelt
2010-10-05  7:41         ` Enrico Weigelt
2010-10-05  8:01           ` Matthieu Moy
2010-10-05  8:17             ` Enrico Weigelt
2010-10-05 11:29               ` Alex Riesen
2010-10-05 11:38                 ` Matthieu Moy
2010-10-05 11:55                   ` Nguyen Thai Ngoc Duy
2010-10-05 16:42                     ` Junio C Hamano
2010-10-05 10:13           ` Nguyen Thai Ngoc Duy
2010-10-05 19:12             ` Nicolas Pitre
2010-10-04 18:58   ` Jonathan Nieder
2010-10-04 19:11     ` Shawn Pearce
2010-10-04 19:16       ` Jonathan Nieder
2010-10-05 10:59         ` Nguyen Thai Ngoc Duy
2010-10-05 20:17         ` Nicolas Pitre [this message]
2010-10-05 20:34           ` Jonathan Nieder
2010-10-05 21:11             ` Nicolas Pitre
2010-10-05  0:57     ` Enrico Weigelt
2010-10-05  1:07       ` Ævar Arnfjörð Bjarmason
2010-10-05  1:10       ` Jonathan Nieder
2010-10-05  7:35         ` Enrico Weigelt
2010-10-05 13:47           ` Jonathan Nieder
2010-10-05  0:50   ` Enrico Weigelt
2010-10-05 19:06     ` Nicolas Pitre
2010-10-05 22:51       ` Enrico Weigelt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.1010051518570.3107@xanadu.home \
    --to=nico@fluxnic.net \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=spearce@spearce.org \
    --cc=weigelt@metux.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).