From: Jonathan Nieder <jrnieder@gmail.com>
To: Nicolas Pitre <nico@fluxnic.net>
Cc: Shawn Pearce <spearce@spearce.org>,
weigelt@metux.de, git@vger.kernel.org
Subject: Re: large files and low memory
Date: Tue, 5 Oct 2010 15:34:50 -0500 [thread overview]
Message-ID: <20101005203450.GA2096@burratino> (raw)
In-Reply-To: <alpine.LFD.2.00.1010051518570.3107@xanadu.home>
Nicolas Pitre wrote:
> You can't do a one-pass calculation. The first one is required to
> compute the SHA1 of the file being added, and if that corresponds to an
> object that we already have then the operation stops right there as
> there is actually nothing to do.
Ah. Thanks for a reminder.
> In the case of big files, what we need to do is to stream the file data
> in, compute the SHA1 and deflate it, in order to stream it out into a
> temporary file, then rename it according to the final SHA1. This would
> allow Git to work with big files, but of course it won't be possible to
> know if the object corresponding to the file is already known until all
> the work has been done, possibly just to throw it away.
To make sure I understand correctly: are you suggesting that for big
files we should skip the first pass?
I suppose that makes sense: for small files, using a patch application
tool to reach a postimage that matches an existing object is something
git historically needed to expect, but for typical big files:
- once you've computed the SHA1, you've already invested a noticeable
amount of time.
- emailing patches around is difficult, making "git am" etc less important
- hopefully git or zlib can notice when files are uncompressible,
making the deflate not cost so much in that case.
next prev parent reply other threads:[~2010-10-05 20:38 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-04 9:20 large files and low memory Enrico Weigelt
2010-10-04 18:05 ` Shawn Pearce
2010-10-04 18:24 ` Joshua Jensen
2010-10-04 18:57 ` Shawn Pearce
2010-10-05 0:59 ` Enrico Weigelt
2010-10-05 7:41 ` Enrico Weigelt
2010-10-05 8:01 ` Matthieu Moy
2010-10-05 8:17 ` Enrico Weigelt
2010-10-05 11:29 ` Alex Riesen
2010-10-05 11:38 ` Matthieu Moy
2010-10-05 11:55 ` Nguyen Thai Ngoc Duy
2010-10-05 16:42 ` Junio C Hamano
2010-10-05 10:13 ` Nguyen Thai Ngoc Duy
2010-10-05 19:12 ` Nicolas Pitre
2010-10-04 18:58 ` Jonathan Nieder
2010-10-04 19:11 ` Shawn Pearce
2010-10-04 19:16 ` Jonathan Nieder
2010-10-05 10:59 ` Nguyen Thai Ngoc Duy
2010-10-05 20:17 ` Nicolas Pitre
2010-10-05 20:34 ` Jonathan Nieder [this message]
2010-10-05 21:11 ` Nicolas Pitre
2010-10-05 0:57 ` Enrico Weigelt
2010-10-05 1:07 ` Ævar Arnfjörð Bjarmason
2010-10-05 1:10 ` Jonathan Nieder
2010-10-05 7:35 ` Enrico Weigelt
2010-10-05 13:47 ` Jonathan Nieder
2010-10-05 0:50 ` Enrico Weigelt
2010-10-05 19:06 ` Nicolas Pitre
2010-10-05 22:51 ` Enrico Weigelt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101005203450.GA2096@burratino \
--to=jrnieder@gmail.com \
--cc=git@vger.kernel.org \
--cc=nico@fluxnic.net \
--cc=spearce@spearce.org \
--cc=weigelt@metux.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).