git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: chadrik <chadrik@gmail.com>
To: git@vger.kernel.org
Subject: read-only working copy using symlinks to blobs
Date: Wed, 21 Jan 2009 00:15:11 -0800 (PST)	[thread overview]
Message-ID: <21578696.post@talk.nabble.com> (raw)


hi all,
i'm looking into using git to manage a lot of very large binary data.  git
seems particularly suited to this task because it has features for saving
disk space such as clone--shared, and it's fast due to simple compression by
default (instead of deltas).

in my mind, there's still one major feature for working with large binaries
that has not been addressed:  the ability to check out symbolic/hard links
to blobs into the working copy instead of creating duplicates of the files.

imagine a scenario where one user is putting large binary files into a git
repo.  100 other users need read-only access to this repo.  they clone the
repo shared, which saves disk space for the object files, but each of these
100 working copies also creates copies of all the binary files at the HEAD
revision. it would be 100x as efficient if, in place of these files,
symbolic or hard links were made to the blob files in .git/objects.  

the crux of the issue is that the blob objects would have to be stored as
exact copies of the original files.  i did some googling and it would seem
there are two things that currently prevent this from happening.  1) blobs
are stored with compression and 2) they include a small header.  compression
can be disabled by setting core.loosecompression to 0, so that seems like
less of an issue.  as for the header, wouldn't it be possible to store it as
a separate file per blob object and thus keep the original data completely
pristine? 

what are the caveats to a system like this?  any thoughts on the
feasibility?

-chad


-- 
View this message in context: http://www.nabble.com/read-only-working-copy-using-symlinks-to-blobs-tp21578696p21578696.html
Sent from the git mailing list archive at Nabble.com.

                 reply	other threads:[~2009-01-21  8:16 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=21578696.post@talk.nabble.com \
    --to=chadrik@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).