Git development
 help / color / mirror / Atom feed
* Git and Media repositories....
@ 2008-11-02 19:50 Tim Ansell
  2008-11-03  6:56 ` Johannes Schindelin
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Tim Ansell @ 2008-11-02 19:50 UTC (permalink / raw)
  To: git

Hey guys,

Last week at the gittogether I lead some discussions about how we could
make Git better support large media repositories (which is one area
where Subversion still make sense). It was suggested that I post to this
list to get a discussion going. 

The general idea is that we always clone the complete meta-data (tags,
commits and trees) and then only clone blobs when they are needed (using
something like alternates). This allows us to support shallow, narrow
and sparse checkouts while still being able to perform operations such
as committing and merging.

You can find a copy of the summary presentation at
 http://www.thousandparsec.net/~tim/media+git.pdf

I have started working on adapting git to check a remote http alternate
to provide a proof of concept.

I appreciate any help or suggestions.

Tim 'mithro' Ansell

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Git and Media repositories....
  2008-11-02 19:50 Git and Media repositories Tim Ansell
@ 2008-11-03  6:56 ` Johannes Schindelin
  2008-11-03  9:40 ` Jakub Narebski
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Johannes Schindelin @ 2008-11-03  6:56 UTC (permalink / raw)
  To: Tim Ansell; +Cc: git

Hi,

On Sun, 2 Nov 2008, Tim Ansell wrote:

> Last week at the gittogether I lead some discussions about how we could 
> make Git better support large media repositories (which is one area 
> where Subversion still make sense). It was suggested that I post to this 
> list to get a discussion going.
> 
> The general idea is that we always clone the complete meta-data (tags, 
> commits and trees) and then only clone blobs when they are needed (using 
> something like alternates). This allows us to support shallow, narrow 
> and sparse checkouts while still being able to perform operations such 
> as committing and merging.
> 
> You can find a copy of the summary presentation at
>  http://www.thousandparsec.net/~tim/media+git.pdf
> 
> I have started working on adapting git to check a remote http alternate 
> to provide a proof of concept.
> 
> I appreciate any help or suggestions.

You might find this message (and others from the same time frame and 
author) pretty interesting:

http://article.gmane.org/gmane.comp.version-control.git/48485

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Git and Media repositories....
  2008-11-02 19:50 Git and Media repositories Tim Ansell
  2008-11-03  6:56 ` Johannes Schindelin
@ 2008-11-03  9:40 ` Jakub Narebski
  2008-11-07 13:00 ` Jakub Narebski
  2008-11-07 13:19 ` Santi Béjar
  3 siblings, 0 replies; 6+ messages in thread
From: Jakub Narebski @ 2008-11-03  9:40 UTC (permalink / raw)
  To: Tim Ansell; +Cc: git, Dana How

Tim Ansell <mithro@mithis.com> writes:

> Last week at the GitTogether I lead some discussions about how we could
> make Git better support large media repositories (which is one area
> where Subversion still make sense). It was suggested that I post to this
> list to get a discussion going. 
> 
> The general idea is that we always clone the complete meta-data (tags,
> commits and trees) and then only clone blobs when they are needed (using
> something like alternates). This allows us to support shallow, narrow
> and sparse checkouts while still being able to perform operations such
> as committing and merging.
> 
> You can find a copy of the summary presentation at
>  http://www.thousandparsec.net/~tim/media+git.pdf
> 
> I have started working on adapting git to check a remote http alternate
> to provide a proof of concept.
> 
> I appreciate any help or suggestions.

Dana How (CC-ed) worked on better support for large files, but in
corporate setting.  The solution that was the result of all discussion
and all patches (not all accpeted) was to create kept packfile for
those large files, and share those packfiles (perhaps via alternates)
using network filesystem, instead of keeping separate copies and
trasferring them on fetch / push.


>From what I remember there was one serious attempt (by serious I mean
here with patches) to add 'lazy clone' / 'sparse clone' / 'remote
alternates', using some kind of "stub" objects and trasferring objects
lazily.  This patch was fairly intrusive, and didn't get accepted.
I think you can find it in archives.  Unfortunately I haven't bookmarked
this thread...

The problem with lazy clone is that git assumes in many places that if
it has some object, it has all its dependencies.  Lazy clone
(on-demand object loading) breaks this assumption... although in your
case (only blobs of large size can be asked to be loaded lazily) it is
migitated somehow.


I also think that you would have to have 'sparse checkout' support.
If you don't have blob in object repository (and don't want to have it
there), you can not check it out.  Fortunately this feature is quite
alive, and worked on by Duy (pclouds), see "What's cooking..."
(nd/narrow branch in 'pu').

HTH
-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Git and Media repositories....
  2008-11-02 19:50 Git and Media repositories Tim Ansell
  2008-11-03  6:56 ` Johannes Schindelin
  2008-11-03  9:40 ` Jakub Narebski
@ 2008-11-07 13:00 ` Jakub Narebski
  2008-11-07 13:19 ` Santi Béjar
  3 siblings, 0 replies; 6+ messages in thread
From: Jakub Narebski @ 2008-11-07 13:00 UTC (permalink / raw)
  To: Tim Ansell; +Cc: git

Tim Ansell <mithro@mithis.com> writes:

> Last week at the gittogether I lead some discussions about how we could
> make Git better support large media repositories (which is one area
> where Subversion still make sense). It was suggested that I post to this
> list to get a discussion going. 
> 
> The general idea is that we always clone the complete meta-data (tags,
> commits and trees) and then only clone blobs when they are needed (using
> something like alternates). This allows us to support shallow, narrow
> and sparse checkouts while still being able to perform operations such
> as committing and merging.
[...]

Well, the *workaround* you could currently use is to put large media
files in separate subdirectory, and make this subdirectory into
submodule.  This uses the fact that you can selectively clone
submodules, or leave them as a stubs...

...and this is also the code you might want to look at when
implementings stubs for 'remote' blob objects

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Git and Media repositories....
  2008-11-02 19:50 Git and Media repositories Tim Ansell
                   ` (2 preceding siblings ...)
  2008-11-07 13:00 ` Jakub Narebski
@ 2008-11-07 13:19 ` Santi Béjar
  2008-11-09  4:58   ` Nguyen Thai Ngoc Duy
  3 siblings, 1 reply; 6+ messages in thread
From: Santi Béjar @ 2008-11-07 13:19 UTC (permalink / raw)
  To: Tim Ansell; +Cc: git

On Sun, Nov 2, 2008 at 8:50 PM, Tim Ansell <mithro@mithis.com> wrote:
> Hey guys,
>

[...]

>
> The general idea is that we always clone the complete meta-data (tags,
> commits and trees) and then only clone blobs when they are needed (using
> something like alternates). This allows us to support shallow, narrow
> and sparse checkouts while still being able to perform operations such
> as committing and merging.
>

A related use case could be to remove a blob from a repo but being
able to work normally with it, similar to:

http://wiki.freebsd.org/VCSFeatureObliterate

Santi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Git and Media repositories....
  2008-11-07 13:19 ` Santi Béjar
@ 2008-11-09  4:58   ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 6+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2008-11-09  4:58 UTC (permalink / raw)
  To: Santi Béjar; +Cc: Tim Ansell, git

On 11/7/08, Santi Béjar <santi@agolina.net> wrote:
> On Sun, Nov 2, 2008 at 8:50 PM, Tim Ansell <mithro@mithis.com> wrote:
>  > Hey guys,
>  >
>
>  [...]
>
>
>  >
>  > The general idea is that we always clone the complete meta-data (tags,
>  > commits and trees) and then only clone blobs when they are needed (using
>  > something like alternates). This allows us to support shallow, narrow
>  > and sparse checkouts while still being able to perform operations such
>  > as committing and merging.
>  >
>
>
> A related use case could be to remove a blob from a repo but being
>  able to work normally with it, similar to:
>
>  http://wiki.freebsd.org/VCSFeatureObliterate

Maybe another use case: encrypted blobs (those are generally
unavailable until corrected password is given, so they are "holes" in
checkout/clone). It could be used to store sensitive content (in $HOME
for example)
-- 
Duy

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-11-09  5:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-02 19:50 Git and Media repositories Tim Ansell
2008-11-03  6:56 ` Johannes Schindelin
2008-11-03  9:40 ` Jakub Narebski
2008-11-07 13:00 ` Jakub Narebski
2008-11-07 13:19 ` Santi Béjar
2008-11-09  4:58   ` Nguyen Thai Ngoc Duy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox