Git development
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Tim Ansell <mithro@mithis.com>
Cc: git@vger.kernel.org, Dana How <danahow@gmail.com>
Subject: Re: Git and Media repositories....
Date: Mon, 03 Nov 2008 01:40:55 -0800 (PST)	[thread overview]
Message-ID: <m3ljw1f8qv.fsf@localhost.localdomain> (raw)
In-Reply-To: <1225655428.11693.10.camel@vaio>

Tim Ansell <mithro@mithis.com> writes:

> Last week at the GitTogether I lead some discussions about how we could
> make Git better support large media repositories (which is one area
> where Subversion still make sense). It was suggested that I post to this
> list to get a discussion going. 
> 
> The general idea is that we always clone the complete meta-data (tags,
> commits and trees) and then only clone blobs when they are needed (using
> something like alternates). This allows us to support shallow, narrow
> and sparse checkouts while still being able to perform operations such
> as committing and merging.
> 
> You can find a copy of the summary presentation at
>  http://www.thousandparsec.net/~tim/media+git.pdf
> 
> I have started working on adapting git to check a remote http alternate
> to provide a proof of concept.
> 
> I appreciate any help or suggestions.

Dana How (CC-ed) worked on better support for large files, but in
corporate setting.  The solution that was the result of all discussion
and all patches (not all accpeted) was to create kept packfile for
those large files, and share those packfiles (perhaps via alternates)
using network filesystem, instead of keeping separate copies and
trasferring them on fetch / push.


>From what I remember there was one serious attempt (by serious I mean
here with patches) to add 'lazy clone' / 'sparse clone' / 'remote
alternates', using some kind of "stub" objects and trasferring objects
lazily.  This patch was fairly intrusive, and didn't get accepted.
I think you can find it in archives.  Unfortunately I haven't bookmarked
this thread...

The problem with lazy clone is that git assumes in many places that if
it has some object, it has all its dependencies.  Lazy clone
(on-demand object loading) breaks this assumption... although in your
case (only blobs of large size can be asked to be loaded lazily) it is
migitated somehow.


I also think that you would have to have 'sparse checkout' support.
If you don't have blob in object repository (and don't want to have it
there), you can not check it out.  Fortunately this feature is quite
alive, and worked on by Duy (pclouds), see "What's cooking..."
(nd/narrow branch in 'pu').

HTH
-- 
Jakub Narebski
Poland
ShadeHawk on #git

  parent reply	other threads:[~2008-11-03  9:42 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-02 19:50 Git and Media repositories Tim Ansell
2008-11-03  6:56 ` Johannes Schindelin
2008-11-03  9:40 ` Jakub Narebski [this message]
2008-11-07 13:00 ` Jakub Narebski
2008-11-07 13:19 ` Santi Béjar
2008-11-09  4:58   ` Nguyen Thai Ngoc Duy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3ljw1f8qv.fsf@localhost.localdomain \
    --to=jnareb@gmail.com \
    --cc=danahow@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=mithro@mithis.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox