From: Junio C Hamano <junkio@cox.net>
To: Carl Worth <cworth@cworth.org>
Cc: git@vger.kernel.org
Subject: Re: Recent unresolved issues: shallow clone
Date: Fri, 14 Apr 2006 17:25:12 -0700 [thread overview]
Message-ID: <7vr73zn0rb.fsf@assigned-by-dhcp.cox.net> (raw)
In-Reply-To: <87irpb7oma.wl%cworth@cworth.org> (Carl Worth's message of "Fri, 14 Apr 2006 15:56:29 -0700")
Carl Worth <cworth@cworth.org> writes:
> On Fri, 14 Apr 2006 02:31:36 -0700, Junio C Hamano wrote:
>> I am beginning to think using "graft" to cauterize history
>> for this, while it technically would work, would not be so
>> helpful to users, so the design needs to be worked out again.
>
> As context, here is some of what you mentioned in IRC:
>
>>> Suppose you have this:
>>>
>>> A---B---C
>>> \ \
>>> D---E---F---G
>>>
>>> and you made a shallow clone of C (because that is where the
>>> upstream master was when you made that clone). Then the
>>> upstream updated the master branch tip to G.
>>>
>>> The next update from upstream to your shallow clone would break.
>>> The upstream says: I have G at master.
>>> You say: I want G then. By the way, I have C.
>>>
>>> What it means to tell the other end "I have X" is to promise
>>> that you have X and _everything_ behind it. So the upstream
>>> would send objects necessary to complete D, E, F and G for
>>> "somebody who already have A and B". As a consequence, you
>>> would not see A nor B.
>>>
>>> Even if the only thing you are interested in is to be in sync
>>> with the tip of the upstream, you can end up with an
>>> incomplete tree for G, if some of the blobs or trees contained
>>> in G already exist in A or B. They are not sent -- because
>>> you told the upstream that you have everything necessary to
>>> get to C.
>
> So that's an argument against using a cauterizing graft for the
> shallow clone of C. It definitely confuses the existing protocol to
> say "I have C" if I have only a cauterized C, (its tree only, but none
> of the commits that should be backing C).
That's what I meant by "graft technically works but is
inconvenient".
Maybe after the update to G happens (which means you now have C,
F, G but not A B D E commits), the client side could enumerate
commits on "rev-list ^C G" and cauterize the ones with missing
parents (in this case, F does not have one of its parents).
While doing this would help keeping the resulting commit
ancestry sane, it does not solve the problem of missing blobs
and trees. See below.
> So, in the scenario above, the original shallow clone of C would be:
>
> Want C->tree, have nothing.
>
> and the later shallow update to G would be:
>
> Want G->tree, have C->tree
When you ask for G, you do not know what G^{tree} is, so that is
fantasy without a protocol extention. To solve the missing
blobs/trees problem we would probably need a protocol extention
that says it wants to receive enough data to complete trees and
blobs associated with the commits being sent _without_ assuming
the recipient has any trees or blobs other than what are
contained in "have" commits. Then after such a successful
transfer, missing parents of commits listed in "rev-list ^C G"
are the ones from the side branch, so the client can cauterize
them (F in the above example) appropriately without bothering
the server.
However, I think this "do not assume I have any trees behind the
commits I explicitly say I have" must be an option, because it
makes the resulting transfer unnecessarily more expensive for
normal uses. A fetch of the Linux kernel once a day would
update about a couple of hundered commits, each of which touches
only 3 paths on average (so that would be 600 files out of
18,000 file tree. When side-branch merges are involved, usually
many things in G (and F) are unchanged since either A or C, but
the extention we are discussing forbids reusing what are found
in A (it still allows reusing what are found in C).
> A final step of a shallow clone would then require creating a new
> parent-less commit object so that there's something to point refs/head
> at, (or maybe rather than being parentless, they could be chained
> together with each update?).
Rewriting commit objects transferred to the cloner is something
you would _not_ want to do (e.g. rewriting F commits to say it
has only one parent C). The history based on that would diverge
from parents and would become unmergeable. It is cleaner to
just make a new graft entry to say "As far as this repository is
concerned, F has one parent C". Shallowness of the repository
and its slightly different view of history is a local matter.
next prev parent reply other threads:[~2006-04-15 0:25 UTC|newest]
Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-04-14 9:31 Recent unresolved issues Junio C Hamano
2006-04-14 16:02 ` Petr Baudis
[not found] ` <20060414151030.11c64730.seanlkml@sympatico.ca>
2006-04-14 19:10 ` sean
2006-04-14 19:24 ` Petr Baudis
2006-04-14 22:56 ` Recent unresolved issues: shallow clone Carl Worth
2006-04-15 0:17 ` Johannes Schindelin
2006-04-15 0:25 ` Junio C Hamano [this message]
2006-04-15 2:11 ` Junio C Hamano
2006-04-14 23:52 ` Recent unresolved issues Linus Torvalds
2006-04-15 0:19 ` Linus Torvalds
2006-04-15 0:39 ` Linus Torvalds
2006-04-15 0:38 ` Junio C Hamano
2006-04-15 0:49 ` Linus Torvalds
2006-04-15 0:56 ` Linus Torvalds
2006-04-15 1:09 ` Linus Torvalds
2006-04-15 2:22 ` Junio C Hamano
2006-04-15 6:18 ` Junio C Hamano
2006-04-15 8:57 ` Junio C Hamano
2006-04-15 11:46 ` Johannes Schindelin
2006-04-15 16:59 ` Linus Torvalds
2006-04-15 17:17 ` Linus Torvalds
2006-04-16 8:14 ` Junio C Hamano
2006-04-15 1:35 ` Junio C Hamano
2006-04-15 4:09 ` Linus Torvalds
2006-04-15 5:06 ` Junio C Hamano
2006-05-04 8:15 ` Unresolved issues #2 Junio C Hamano
2006-05-04 8:32 ` Jakub Narebski
2006-05-04 9:14 ` Junio C Hamano
2006-05-04 9:26 ` Jakub Narebski
2006-05-04 9:58 ` Petr Baudis
2006-05-04 15:45 ` Pavel Roskin
2006-05-04 17:01 ` Unresolved issues #2 (shallow clone again) Carl Worth
2006-05-05 0:25 ` Junio C Hamano
2006-05-05 5:17 ` Martin Langhoff
2006-05-05 5:23 ` Carl Worth
2006-05-05 5:48 ` Jakub Narebski
2006-05-05 15:10 ` Linus Torvalds
2006-05-05 15:18 ` Jakub Narebski
2006-05-05 15:59 ` Linus Torvalds
2006-05-06 6:23 ` Martin Langhoff
2006-05-06 7:10 ` Junio C Hamano
2006-05-07 6:08 ` Martin Langhoff
2006-05-07 7:56 ` Jeff King
2006-05-07 15:27 ` Linus Torvalds
2006-05-08 4:24 ` Jeff King
2006-05-08 15:32 ` Linus Torvalds
2006-05-08 0:33 ` Theodore Tso
2006-05-08 0:50 ` Linus Torvalds
2006-05-08 1:26 ` Theodore Tso
2006-05-08 2:04 ` Linus Torvalds
2006-05-08 2:24 ` Theodore Tso
2006-05-08 2:42 ` Linus Torvalds
2006-05-07 8:01 ` Sergey Vlasov
2006-05-07 23:27 ` Martin Langhoff
2006-05-07 23:35 ` Junio C Hamano
2006-05-07 23:44 ` Martin Langhoff
2006-05-05 15:31 ` Carl Worth
2006-05-07 13:30 ` Jakub Narebski
2006-05-08 2:54 ` Junio C Hamano
2006-05-08 4:02 ` Jakub Narebski
2006-05-08 4:24 ` Jakub Narebski
2006-05-04 20:41 ` Unresolved issues #2 Daniel Barkalow
2006-05-04 21:33 ` Linus Torvalds
2006-05-06 5:58 ` Junio C Hamano
2006-05-06 15:26 ` Linus Torvalds
[not found] ` <20060506113549.48e553d1.seanlkml@sympatico.ca>
2006-05-06 15:35 ` sean
2006-05-06 16:30 ` Linus Torvalds
[not found] ` <20060506125323.544c35db.seanlkml@sympatico.ca>
2006-05-06 16:53 ` sean
2006-05-06 17:20 ` Linus Torvalds
2006-05-06 21:16 ` Junio C Hamano
2006-05-06 21:33 ` Johannes Schindelin
2006-05-06 21:51 ` Linus Torvalds
2006-05-07 9:39 ` Junio C Hamano
2006-05-07 9:42 ` Junio C Hamano
2006-05-07 11:31 ` Johannes Schindelin
2006-05-07 11:38 ` Jakub Narebski
2006-05-08 2:51 ` Junio C Hamano
2006-05-07 0:41 ` Jakub Narebski
2006-05-09 11:40 ` David Woodhouse
2006-05-09 11:53 ` Bertrand Jacquin
2006-05-09 13:09 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7vr73zn0rb.fsf@assigned-by-dhcp.cox.net \
--to=junkio@cox.net \
--cc=cworth@cworth.org \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).