git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <junkio@cox.net>
To: "Martin Langhoff" <martin.langhoff@gmail.com>
Cc: "Jon Smirl" <jonsmirl@gmail.com>, git@vger.kernel.org
Subject: Re: Change set based shallow clone
Date: Thu, 07 Sep 2006 22:30:51 -0700	[thread overview]
Message-ID: <7vac5ancvo.fsf@assigned-by-dhcp.cox.net> (raw)
In-Reply-To: <46a038f90609072054u5ec8bc46x9878a601953b2c5d@mail.gmail.com> (Martin Langhoff's message of "Fri, 8 Sep 2006 15:54:11 +1200")

"Martin Langhoff" <martin.langhoff@gmail.com> writes:

> On 9/8/06, Junio C Hamano <junkio@cox.net> wrote:
> ...
>> Anyway, it is a useful feature, an important operation, and
>> it involves a lot of thinking (and hard work, obviously).
>>
>> I will not think about this anymore, until I have absolutely
>> nothing else to do for a solid few days, but you probably
>> haven't read this far ;-).
>
> For when you come back, then. I agree this is rather complicated, so
> much so that I suspect that we may be barking up the wrong tree.
>
> People who want shallow clones are actually asking for a "light"
> clone, in terms of "how much do I need to download". If everyone has
> the whole commit chain (but may be missing olden blobs, and even
> trees), the problem becomes a lot easier.

No, I do not think so.  You are just pushing the same problem to
another layer.

Reachability through commit ancestry chain is no more special
than reachability through commit-tree or tree-containment
relationships.  The grafts mechanism happen to treat commit
ancestry more specially than others, but that is not inherent
property of git [*1*].  It is only an implementation detail (and
limitation -- grafts cannot express anything but commit
ancestry) of the current system.

But let's touch a slightly different but related topic first.
People do not ask for shallow clones.  They just want faster
transfer of "everything they care about".  Shallow and lazy
clones are implementation techniques to achieve the goal of
making everything they care about appear available, typically
taking advantage of the fact that people care about recent past
a lot more than they do about ancient history.

Shallow clones do so by explicitly saying "sorry you told me
earlier you do not care about things older than X so if you now
care about things older than X please do such and such to deepen
your history".  What you are saying is a variant of shallow that
says "NAK; commits I have but trees and blobs associated with
such an old commit you have to ask me to retrieve from
elsewhere".  Lazy clones do so by "faulting missing objects on
demand" [*2*].  That is essentially automating the "please do
such and such" part.  So they are all not that different.

Now, first and foremost, while I would love to have a system
that gracefully operates with a "sparse" repository that lacks
objects that should exist from tag/commit/tree/blob reachability
point of view, it is an absolute requirement that I can tell why
objects are missing from a repository when I find some are
missing by running fsck-objects [*3*].  If repository is a
shallow clone, not having some object may be expected, but I
want to be able to tell repository corruption locally even in
that case, so "assume lack of object is perhaps it was not
downloaded by shallow cloning" is an unacceptable attitude, and
"when you find an object missing from your repository, you can
ask the server [*4*], and if the server does not have it then
you know your repository is corrupt otherwise it is Ok" is a
wrong answer.

I talked about the need of upload-pack protocol extension to
exchange grafts information between uploader and downloader to
come up with the resulting graft and update the grafts in
downloaded repository after objects tranfer is done.  It is
needed because by having such cauterizing graft entries I can be
sure that the repository is not corrupt when fsck-objects that
does look at grafts says "nothing is missing".  Jon talked about
"fault-in on demand and leave stub objects until downloading the
real thing"; those stub objects are natural extension of grafts
facility but extends to trees and blobs.  Either of these would
allow me to validate the sparse repository locally.

Now I'll really shut up ;-).


[*1*] For example, rev-list --object knows that it needs to show
the tree-containment when given only a tree object without any commit.

[*2*] You probably could write a FUSE module that mount on
your .git/objects, respond to accesses to .git/objects/??
directory and files with 38-byte hexadecimal names under them by
talking to other repositories over the wire.  No, I am not going
to do it, and I will probably not touch it even with ten-foot
pole for reasons I state elsewhere in this message.

[*3*] The real fsck-objects always looks at grafts.  The
fictitious version of fsck-objects I am talking about here is
the one that uses parents recorded on the "parent " line of
commit objects, that is, the true object level reachability.

[*4*] In git, there is no inherent server vs client or upstream
vs downstream relationship between repositories.  You may be
even fetching from many people and do not have a set upstream at
all.  Or you are _the_ upstream, and your notebook has the
latest devevelopment history, and after pushing that latest
history to your mothership repository, you may decide you do not
want ancient development history on a puny notebook, and locally
cauterize the history on your notebook repository and prune
ancient stuff.  The objects missing from the notebook repository
are retrievable from your mothership repository again if/when
needed and you as the user would know that (and if you are lucky
you may even remember that a few months later), but git doesn't,
and there is no reason for git to want to know it.  If we want
to do "fault-in on demand", we need to add a way to say "it is
Ok for this repository to be incomplete -- objects that are
missing must be completed from that repository (or those
repositories)".  But that's quite a special case of having a
fixed upstream.

  reply	other threads:[~2006-09-08  5:30 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-07 19:52 Change set based shallow clone Jon Smirl
2006-09-07 20:21 ` Jakub Narebski
2006-09-07 20:41   ` Jon Smirl
2006-09-07 21:33     ` Jeff King
2006-09-07 21:51       ` Jakub Narebski
2006-09-07 21:37     ` Jakub Narebski
2006-09-07 22:14     ` Junio C Hamano
2006-09-07 23:09       ` Jon Smirl
2006-09-10 23:20         ` Anand Kumria
2006-09-08  8:48     ` Andreas Ericsson
2006-09-07 22:07 ` Junio C Hamano
2006-09-07 22:40   ` Jakub Narebski
2006-09-08  3:54   ` Martin Langhoff
2006-09-08  5:30     ` Junio C Hamano [this message]
2006-09-08  7:15       ` Martin Langhoff
2006-09-08  8:33         ` Junio C Hamano
2006-09-08 17:18         ` A Large Angry SCM
2006-09-08 14:20       ` Jon Smirl
2006-09-08 15:50         ` Jakub Narebski
2006-09-09  3:13           ` Petr Baudis
2006-09-09  8:39             ` Jakub Narebski
2006-09-08  5:05   ` Aneesh Kumar K.V
2006-09-08  1:01 ` linux
2006-09-08  2:23   ` Jon Smirl
2006-09-08  8:36     ` Jakub Narebski
2006-09-08  8:39       ` Junio C Hamano
2006-09-08 18:42     ` linux
2006-09-08 21:13       ` Jon Smirl
2006-09-08 22:27         ` Jakub Narebski
2006-09-08 23:09         ` Linus Torvalds
2006-09-08 23:28           ` Jon Smirl
2006-09-08 23:45             ` Paul Mackerras
2006-09-09  1:45               ` Jon Smirl
2006-09-10 12:41             ` Paul Mackerras
2006-09-10 14:56               ` Jon Smirl
2006-09-10 16:10                 ` linux
2006-09-10 18:00                   ` Jon Smirl
2006-09-10 19:03                     ` linux
2006-09-10 20:00                       ` Linus Torvalds
2006-09-10 21:00                         ` Jon Smirl
2006-09-11  2:49                           ` Linus Torvalds
2006-09-10 22:41                         ` Paul Mackerras
2006-09-11  2:55                           ` Linus Torvalds
2006-09-11  3:18                             ` Linus Torvalds
2006-09-11  6:35                               ` Junio C Hamano
2006-09-11 18:54                               ` Junio C Hamano
2006-09-11  8:36                             ` Paul Mackerras
2006-09-11 14:26                               ` linux
2006-09-11 15:01                                 ` Jon Smirl
2006-09-11 16:47                                 ` Junio C Hamano
2006-09-11 21:52                                   ` Paul Mackerras
2006-09-11 23:47                                     ` Junio C Hamano
2006-09-12  0:06                                       ` Jakub Narebski
2006-09-12  0:18                                         ` Junio C Hamano
2006-09-12  0:25                                           ` Jakub Narebski
2006-09-11  9:04                             ` Jakub Narebski
2006-09-10 18:51                 ` Junio C Hamano
2006-09-11  0:04                   ` Shawn Pearce
2006-09-11  0:42                     ` Junio C Hamano
2006-09-11  0:03               ` Shawn Pearce
2006-09-11  0:41                 ` Junio C Hamano
2006-09-11  1:04                   ` Jakub Narebski
2006-09-11  2:44                     ` Shawn Pearce
2006-09-11  5:27                       ` Junio C Hamano
2006-09-11  6:08                         ` Shawn Pearce
2006-09-11  7:11                           ` Junio C Hamano
2006-09-11 17:52                             ` Shawn Pearce
2006-09-11  2:11                   ` Jon Smirl
2006-09-09  1:05           ` Paul Mackerras
2006-09-09  2:56             ` Linus Torvalds
2006-09-09  3:23               ` Junio C Hamano
2006-09-09  3:31               ` Paul Mackerras
2006-09-09  4:04                 ` Linus Torvalds
2006-09-09  8:47                   ` Marco Costalba
2006-09-09 17:33                     ` Linus Torvalds
2006-09-09 18:04                       ` Marco Costalba
2006-09-09 18:44                         ` linux
2006-09-09 19:17                           ` Marco Costalba
2006-09-09 20:05                         ` Linus Torvalds
2006-09-09 20:43                           ` Jeff King
2006-09-09 21:11                             ` Junio C Hamano
2006-09-09 21:14                               ` Jeff King
2006-09-09 21:40                             ` Linus Torvalds
2006-09-09 22:54                               ` Jon Smirl
2006-09-10  0:18                                 ` Linus Torvalds
2006-09-10  1:22                                   ` Junio C Hamano
2006-09-10  3:49                           ` Marco Costalba
2006-09-10  4:13                             ` Junio C Hamano
2006-09-10  4:23                               ` Marco Costalba
2006-09-10  4:46                                 ` Marco Costalba
2006-09-10  4:54                                 ` Junio C Hamano
2006-09-10  5:14                                   ` Marco Costalba
2006-09-10  5:46                                     ` Junio C Hamano
2006-09-10 15:21                                     ` linux
2006-09-10 18:32                                       ` Marco Costalba
2006-09-11  9:56                                       ` Paul Mackerras
2006-09-11 12:39                                         ` linux
2006-09-10  9:49                                   ` Jakub Narebski
2006-09-10 10:28                                   ` Josef Weidendorfer
  -- strict thread matches above, loose matches on Subject: below --
2006-09-09 10:31 linux
2006-09-09 13:00 ` Marco Costalba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vac5ancvo.fsf@assigned-by-dhcp.cox.net \
    --to=junkio@cox.net \
    --cc=git@vger.kernel.org \
    --cc=jonsmirl@gmail.com \
    --cc=martin.langhoff@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).