git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Braun <thomas.braun@virtuell-zuhause.de>
To: "Stewart, Louis (IS)" <louis.stewart@ngc.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: EXT :Re: GIT and large files
Date: Tue, 20 May 2014 20:27:20 +0200	[thread overview]
Message-ID: <1400610440.14137.18.camel@thomas-debian-x64> (raw)
In-Reply-To: <C755E6FBF6DC4447BEF161CE48BDE0BD2F0CD631@XMBVAG73.northgrum.com>

Am Dienstag, den 20.05.2014, 17:24 +0000 schrieb Stewart, Louis (IS):
> Thanks for the reply.  I just read the intro to GIT and I am concerned
> about the part that it will copy the whole repository to the developers
> work area.  They really just need the one directory and files under
> that one directory. The history has TBs of data.
> 
> Lou
> 
> -----Original Message-----
> From: Junio C Hamano [mailto:gitster@pobox.com] 
> Sent: Tuesday, May 20, 2014 1:18 PM
> To: Stewart, Louis (IS)
> Cc: git@vger.kernel.org
> Subject: EXT :Re: GIT and large files
> 
> "Stewart, Louis (IS)" <louis.stewart@ngc.com> writes:
> 
> > Can GIT handle versioning of large 20+ GB files in a directory?
> 
> I think you can "git add" such files, push/fetch histories that
> contains such files over the wire, and "git checkout" such files, but
> naturally reading, processing and writing 20+GB would take some time. 
> In order to run operations that need to see the changes, e.g. "git log
> -p", a real content-level merge, etc., you would also need sufficient
> memory because we do things in-core.

Preventing that a clone fetches the whole history can be done with the
--depth option of git clone.

The question is what do you want to do with these 20G files?
Just store them in the repo and *very* occasionally change them?
For that you need a 64bit compiled version of git with enough ram. 32G
does the trick here. Everything with git 1.9.1.

Doing some tests on my machine with a normal harddisc gives (sorry for
LC_ALL != C):
$time git add file.dat; time git commit -m "add file"; time git status

real    16m17.913s
user    13m3.965s
sys     0m22.461s
[master 15fa953] add file
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 file.dat

real    15m36.666s
user    13m26.962s
sys     0m16.185s
# Auf Branch master
nichts zu committen, Arbeitsverzeichnis unverändert

real    11m58.936s
user    11m50.300s
sys     0m5.468s

$ls -lh
-rw-r--r-- 1 thomas thomas 20G Mai 20 19:01 file.dat

So this works but aint fast.

Playing some tricks with --assume-unchanged helps here:
$git update-index --assume-unchanged file.dat
$time git status
# Auf Branch master
nichts zu committen, Arbeitsverzeichnis unverändert

real    0m0.003s
user    0m0.000s
sys     0m0.000s

This trick is only save if you *know* that file.dat does not change.

And btw I also set 
$cat .gitattributes 
*.dat -delta
as delta compresssion should be skipped in any case.

Pushing and pulling these files to and from a server needs some tweaking
on the server side, otherwise the occasional git gc might kill the box.
 
Btw. I happily have files with 1.5GB in my git repositories and also
change them. And also work with git for windows. So in this region of
file sizes things work quite well.

      parent reply	other threads:[~2014-05-20 18:27 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-20 15:37 GIT and large files Stewart, Louis (IS)
2014-05-20 16:03 ` Jason Pyeron
     [not found] ` <CALygMcCifDd4LAddZJ4tNcqqwBSvb6BGzTODHBzshBOjCwSrHQ@mail.gmail.com>
2014-05-20 16:53   ` EXT :Re: " Stewart, Louis (IS)
2014-05-20 17:08 ` Marius Storm-Olsen
2014-05-20 17:18 ` Junio C Hamano
2014-05-20 17:24   ` EXT :Re: " Stewart, Louis (IS)
2014-05-20 18:14     ` Junio C Hamano
2014-05-20 18:18       ` Stewart, Louis (IS)
2014-05-20 19:01         ` Konstantin Khomoutov
2014-05-20 18:27     ` Thomas Braun [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1400610440.14137.18.camel@thomas-debian-x64 \
    --to=thomas.braun@virtuell-zuhause.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=louis.stewart@ngc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).