'sparse' clone idea

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* 'sparse' clone idea
@ 2006-06-14  8:23 Jakub Narebski
  2006-06-14  9:20 ` Johannes Schindelin
  0 siblings, 1 reply; 3+ messages in thread
From: Jakub Narebski @ 2006-06-14  8:23 UTC (permalink / raw)
  To: git

I wonder if 'sparse clone' idea described below would avoid the most
difficult part of 'shallow clone' idea, namely the [sometimes] need to
un-cauterize history. See: (<7vac8lidwi.fsf@assigned-by-dhcp.cox.net>).

'sparse clone' begins like 'shallow clone': full history is copied down to
specified point of history (cut-off or cauterization point for shallow
clone), but instead of cauterizing the history from that point downwards,
the history is simplified using grafts.

In the sparse part we need:
 * all commits pointed by tags (if we clone/copy tags) 
   and other refs (if we clone/copy those tags)
 * merge bases for all commits in full, and in the sparse part,
   _including_ merge bases themselves
 * all roots

Commits in sparse part would be connected like in original history, only
skipping "uniteresting" commits.

Thoughts? Comments?

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 'sparse' clone idea
  2006-06-14  8:23 'sparse' clone idea Jakub Narebski
@ 2006-06-14  9:20 ` Johannes Schindelin
  2006-06-14  9:44   ` Jakub Narebski
  0 siblings, 1 reply; 3+ messages in thread
From: Johannes Schindelin @ 2006-06-14  9:20 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Hi,

On Wed, 14 Jun 2006, Jakub Narebski wrote:

> I wonder if 'sparse clone' idea described below would avoid the most
> difficult part of 'shallow clone' idea, namely the [sometimes] need to
> un-cauterize history. See: (<7vac8lidwi.fsf@assigned-by-dhcp.cox.net>).

I do not think that is the hardest problem. The hardest thing is to tell 
the server in an efficient manner which objects we have.

Example:

A - B - C - D
    ^ cutoff
        ^ current HEAD

Suppose B is your fake root, C is your HEAD, you want to fetch D. Now, 
make it a difficult example: both A and D contain a certain blob Z, but 
neither B nor C do. You have to tell the server _in an efficient manner_ 
to send Z also.

And by efficient manner I mean: you may not bring the server down just 
because 5 people with shallow clones decide to fetch from it.

> 'sparse clone' begins like 'shallow clone': full history is copied down to
> specified point of history (cut-off or cauterization point for shallow
> clone), but instead of cauterizing the history from that point downwards,
> the history is simplified using grafts.
> 
> In the sparse part we need:
>  * all commits pointed by tags (if we clone/copy tags) 
>    and other refs (if we clone/copy those tags)
>  * merge bases for all commits in full, and in the sparse part,
>    _including_ merge bases themselves

Hmmm. You cannot know _all_ merge bases beforehand, because you do not 
decide where other people fork off.

>  * all roots

Why?

> Commits in sparse part would be connected like in original history, only
> skipping "uniteresting" commits.

Interesting idea, though I do not think it solves the most pressing 
problems we have with shallow clones.

Ciao,
Dscho

P.S.: I think the problems of a lazy clone are much easier to solve...

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 'sparse' clone idea
  2006-06-14  9:20 ` Johannes Schindelin
@ 2006-06-14  9:44   ` Jakub Narebski
  0 siblings, 0 replies; 3+ messages in thread
From: Jakub Narebski @ 2006-06-14  9:44 UTC (permalink / raw)
  To: git

Johannes Schindelin wrote:

> On Wed, 14 Jun 2006, Jakub Narebski wrote:
> 
>> I wonder if 'sparse clone' idea described below would avoid the most
>> difficult part of 'shallow clone' idea, namely the [sometimes] need to
>> un-cauterize history. See: (<7vac8lidwi.fsf@assigned-by-dhcp.cox.net>).
> 
> I do not think that is the hardest problem. The hardest thing is to tell 
> the server in an efficient manner which objects we have.
> 
> Example:
> 
> A - B - C - D
>     ^ cutoff
>         ^ current HEAD
> 
> Suppose B is your fake root, C is your HEAD, you want to fetch D. Now, 
> make it a difficult example: both A and D contain a certain blob Z, but 
> neither B nor C do. You have to tell the server _in an efficient manner_ 
> to send Z also.
> 
> And by efficient manner I mean: you may not bring the server down just 
> because 5 people with shallow clones decide to fetch from it.

Nah, that I think is solved. Check the mentioned post by Junio C Hamano
in the "Re: Figured out how to get Mozilla into git" post:

 http://permalink.gmane.org/gmane.comp.version-control.git/21603

(although it would need extension to the git protocol). Client and server 
do graft exchange both ways, limiting the commit ancestry graph the both
ends walk to the intersection of the fake view of the ancestry graph both
ends have. Then server uses those virtual grafts to calculate which objects
to send.

The rest is done (or should be done) by history grafting code.

>>  * merge bases for all commits in full, and in the sparse part,
>>    _including_ merge bases themselves
> 
> Hmmm. You cannot know _all_ merge bases beforehand, because you do not 
> decide where other people fork off.

By all merge bases I mean merge bases for all commits in full part, merge
bases for all commits in full part and commits pointed by tags in sparse
part, merge bases for all commits in full part and tagged in sparse part
and merge bases in sparse part etc. recursively.

>>  * all roots
> 
> Why?

Just in case, as an ultimate merge bases.

> P.S.: I think the problems of a lazy clone are much easier to solve...

I still think that the correct idea for the lazy clone is to have soft
grafts, so you have to solve at least part of shallo clone/sparse clone
problems first.

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-06-14  9:44 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-14  8:23 'sparse' clone idea Jakub Narebski
2006-06-14  9:20 ` Johannes Schindelin
2006-06-14  9:44   ` Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).