[RFC] shallow clone

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] shallow clone
@ 2006-01-30  7:18 Junio C Hamano
  2006-01-30 11:39 ` Johannes Schindelin
       [not found] ` <43DF1F1D.1060704@innova-card.com>
  0 siblings, 2 replies; 30+ messages in thread
From: Junio C Hamano @ 2006-01-30  7:18 UTC (permalink / raw)
  To: git

Shallow History Cloning
=======================

One good thing about git repository is that each clone is a
freestanding and complete entity, and you can keep developing in
it offline, without talking to the outside world, knowing that
you can sync with them later when online.

It is also a bad thing.  It gives people working on projects
with long development history stored in CVS a heart attack when
we tell them that their clones need to store the whole history.

There was a suggestion by Linus to allow a partial clone using a
syntax like this:

	$ git clone --since=v2.6.14 git://.../linux-2.6/ master

Here is an outline of what changes are needed to the current
core to do this.

Strategy
--------

We have `info/grafts` mechanism to fake parent information for
commit objects.  Using this facility, we could roughly do:

. Download the full tree for v2.6.14 commit and store its
  objects locally.

. Set up `info/grafts` to lie to the local git that Linux kernel
  history began at v2.6.14 version.

. Run `git fetch git://.../linux-2.6 master`, with a local ref
  pointing at v2.6.14 commit, to pretend that we have everything
  up to v2.6.14 to `upload-pack` running on the other end.

. Update the `origin` branch with the master commit object name
  we just fetched from Linus.

There are some issues.

. In the fetch above to obtain everything after v2.6.14, and
  future runs of `git fetch origin`, if a blob that is in the
  commit being fetched happens to match what used to be in a
  commit that is older than v2.6.14 (e.g. a patch was reverted),
  `upload-pack` running on the other end is free to omit sending
  it, because we are telling it that we are up to date with
  respect to v2.6.14.  Although I think the current `rev-list
  --objects` implementation does not always do such a revert
  optimization if the revert is to a blob in a revision that is
  sufficiently old, it is free to optimize more aggressively in
  the future.

. Later when the user decides to fetch older history, the
  operation can become a bit cumbersome.

I think the latter one is cumbersome but is doable -- we could
do the equivalent of:

	$ git clone --since=v2.6.13 origin v2.6.14

place all the objects obtained by such a clone/fetch operation
and remember that now we have history beginning at v2.6.13.  So
let's worry about that later.

For the first issue, we need to have the other end cooperate
while fetching from it.  If the other end also thinks the
development started at v2.6.14, even if we tell that we have the
history up to v2.6.14 (or a commit we obtained since then),
there is no way for `upload-pack` running there to optimize too
agressively and assume we have a blob that appeared in v2.6.13.
More simply, we do not have to tell them we have anything -- if
the other end thinks the epoch is at v2.6.14, only commits that
comes later will be sent to us.

Design
------

First, to bootstrap the process, we would need to add a way to
obtain all objects associated with a commit.  We could do a new
program, or we could implement this as a protocol extension to
`upload-pack`.  My current inclination is the latter.

When talking with `upload-pack` that supports this extension,
the downloader can give one commit object name and get a pack
that contains all the objects in the tree associated with that
commit, plus the commit object itself.  This is a rough
equivalent of running the commit walker with the `-t` flag.

Another functionality we would need is to tell `upload-pack` to
use `info/grafts` of downloader's choice.  With this, after
fetching the objects for v2.6.14 commit, the downloader can set
up its own grafts file to cauterize the development history at
v2.6.14, and tell the `upload-pack` to pretend the kernel
history starts at that commit, while sending the tip of Linus'
development track to us.

Using the extended protocol (let's call it 'shallow' extension),
a clone to create a repository that has only recent kernel
history since v2.6.14 goes like this:

The first client is to fetch the v2.6.14 itself.

[NOTE]
Most likely this is not directly run by the user but is run as
the first command invoked by the shallow clone script.

1. The `fetch-pack` command acquires a new option, `--single`:

	$ git-fetch-pack --single git://.../linux-2.6/ v2.6.14

   This talks with `upload-pack` on the kernel.org server via
   `git-daemon`.

2. `upload-pack` tells the fetcher what commits it has,
   what their refs are, and what protocol extensions it
   supports, as usual.

3. If it does not see `shallow` extension supported, there is no
   way to get a single tree, so things fail here.  Otherwise, it
   sends `single X{40}\0` request, instead of the usual `want`
   line.  The object name sent here is the desired commit.

4. `upload-pack` notices this is a single commit request, and
   sends an ACK if it can satisfy the request (or a NAK if it
   can't, e.g. it does not have the asked commit).  Instead of
   doing the usual `get_common_commits` followed by
   `create_pack_file`, it does:

	$ git rev-list -n1 --objects $commit | git pack-object

   and sends the result out.

5. The fetcher checks the ACK and receives the objects.

After the above exchange, we have downloaded v2.6.14 commit and
its objects but not its history.  `git-fetch-pack` would output
the tag object name for `v2.6.14` and we would stash it away in
`$GIT_DIR/FETCH_HEAD` as usual.  Then we set up `info/grafts`
with this:

	$ git rev-parse FETCH_HEAD^{commit} >"$GIT_DIR/info/grafts"

This cauterizes the history on our end.

The second phase of the shallow clone is to fetch the history
since v2.6.14 to the tip.

1. The `fetch-pack` command is run as usual.  Most likely the
   command line run by the shallow clone script would be:

	$ git fetch-pack git://.../linux-2.6/ master

   Notice there is nothing magical about it.  It is just the
   business as usual.

2. `upload-pack` does its usual greeting to the downloader.

3. We notice `shallow` extension again, and first send out
   `graft X{40}\0` request.  The syntax of graft request would
   be `graft ` followed by one or more commit object names on a
   line separated with SP.  After sending out all the needed
   graft requests (in this example there is only one, to
   cauterize the history at v2.6.14), it does the usual `want
   X{40}\0multi_ack` and a flush.

4. `upload-pack` notices graft requests, reinitializes its graft
   information with what it receives from the other end, and
   then records `want`.

5. After the above steps, the usual `upload-pack` vs
   `fetch-pack` exchange continues and objects needed to
   complete the Linus' tip of development trail for somebody who
   has v2.6.14 are sent in a pack.  The difference from the
   usual operation is that `upload-pack` during this run thinks
   v2.6.14 commit does not have any parent.

The exact sequence from the second part of the initial "shallow
clone" can be used for further updates.

There is a small issue about the actual implementation.  In the
above description I pretended that `upload-pack` can be told to
use phony grafts information, but in the current implementation
the program that needs to use phony grafts information is
`rev-list` spawned from it.  We _could_ point GIT_GRAFT_FILE
environment variable point at a temporary file while we do so,
but I'd like to avoid using a temporary file if possible, given
that `upload-pack` is run from `git-daemon`.  Maybe we could
give --read-graft-from-stdin flag to `rev-list` for this
purpose.

Anybody want to try?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] shallow clone
  2006-01-30  7:18 [RFC] shallow clone Junio C Hamano
@ 2006-01-30 11:39 ` Johannes Schindelin
  2006-01-30 11:58   ` Simon Richter
  2006-01-30 18:46   ` Junio C Hamano
       [not found] ` <43DF1F1D.1060704@innova-card.com>
  1 sibling, 2 replies; 30+ messages in thread
From: Johannes Schindelin @ 2006-01-30 11:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi,

On Sun, 29 Jan 2006, Junio C Hamano wrote:

> Strategy
> --------
> 
> We have `info/grafts` mechanism to fake parent information for
> commit objects.  Using this facility, we could roughly do:
> 
> . Download the full tree for v2.6.14 commit and store its
>   objects locally.

On first read, I mistook "tree" for "commit"...

> . Set up `info/grafts` to lie to the local git that Linux kernel
>   history began at v2.6.14 version.

Maybe also record this in .git/config, so that you can

- disallow fetching from this repo, and
- easily extend the shallow copy to a larger shallow one, or a full one.

> . Run `git fetch git://.../linux-2.6 master`, with a local ref
>   pointing at v2.6.14 commit, to pretend that we have everything
>   up to v2.6.14 to `upload-pack` running on the other end.

How about refs/tags/start_shallow?

> . Update the `origin` branch with the master commit object name
>   we just fetched from Linus.
> 
> Design
> ------
>
> [...]
>
> Another functionality we would need is to tell `upload-pack` to
> use `info/grafts` of downloader's choice.  With this, after
> fetching the objects for v2.6.14 commit, the downloader can set
> up its own grafts file to cauterize the development history at
> v2.6.14, and tell the `upload-pack` to pretend the kernel
> history starts at that commit, while sending the tip of Linus'
> development track to us.

Why not just start another fetch? Then, "have <refs/tags/start_shallow>" 
would be sent, and upload-pack does the right thing?

If you absolutely want to get only one pack, which then is stored as-is, 
upload-pack could start two rev-list processes: one for the tree and one 
for all the rest.

> [...]
> 
> [NOTE]
> Most likely this is not directly run by the user but is run as
> the first command invoked by the shallow clone script.

Better make it an option to git-clone

> 4. `upload-pack` notices this is a single commit request, and
>    sends an ACK if it can satisfy the request (or a NAK if it
>    can't, e.g. it does not have the asked commit).  Instead of
>    doing the usual `get_common_commits` followed by
>    `create_pack_file`, it does:
> 
> 	$ git rev-list -n1 --objects $commit | git pack-object

Here it could say

(git rev-list -n1 --objects $commit_since; git rev-list --objects 
	^$commit_since $commit) | git pack-object

If the former is still needed (e.g. for git-tar-remote-tree), we could 
distinguish "single <ref>" and "shallow <ref>" commands.

> [...]
> 
> The second phase of the shallow clone is to fetch the history
> since v2.6.14 to the tip.

As I outlined above, I don't see the need for this.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] shallow clone
  2006-01-30 11:39 ` Johannes Schindelin
@ 2006-01-30 11:58   ` Simon Richter
  2006-01-30 12:13     ` Johannes Schindelin
  2006-01-30 19:25     ` Junio C Hamano
  2006-01-30 18:46   ` Junio C Hamano
  1 sibling, 2 replies; 30+ messages in thread
From: Simon Richter @ 2006-01-30 11:58 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 1523 bytes --]

Hi,

Johannes Schindelin wrote:

>>. Set up `info/grafts` to lie to the local git that Linux kernel
>>  history began at v2.6.14 version.

> Maybe also record this in .git/config, so that you can

I like that "config" thing less and less every day. It appears to become 
a kind of registry, where having dedicated files for specific 
functionality would provide the robustness of tools not having to touch 
things they do not care about; but that's just personal opinion.

> - disallow fetching from this repo, and

Why? It's perfectly acceptable to pull from an incomplete repo, as long 
as you don't care about the old history.

> - easily extend the shallow copy to a larger shallow one, or a full one.

Hrm, I think there should also be a way to shrink a repo and "forget" 
old history occasionally (obviously, use of that feature would be highly 
discouraged).

>>. Run `git fetch git://.../linux-2.6 master`, with a local ref
>>  pointing at v2.6.14 commit, to pretend that we have everything
>>  up to v2.6.14 to `upload-pack` running on the other end.

> How about refs/tags/start_shallow?

No, as that would imply that cloning from such a repo is disallowed.

IMO, it may be a lot more robust to just have a list of "cutoff" object 
ids in .git/shallow instead of messing with grafts here, as adding or 
removing a line from that file is an easier thing to do for porcelain 
(or by hand) than rewriting the grafts file. Whether that list would be 
inclusive or exclusive would need to be decided still.

    Simon

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 307 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] shallow clone
  2006-01-30 11:58   ` Simon Richter
@ 2006-01-30 12:13     ` Johannes Schindelin
  2006-01-30 13:25       ` Simon Richter
  2006-01-30 19:25       ` Junio C Hamano
  2006-01-30 19:25     ` Junio C Hamano
  1 sibling, 2 replies; 30+ messages in thread
From: Johannes Schindelin @ 2006-01-30 12:13 UTC (permalink / raw)
  To: Simon Richter; +Cc: Junio C Hamano, git

Hi,

On Mon, 30 Jan 2006, Simon Richter wrote:

> Johannes Schindelin wrote:
> 
> > > . Set up `info/grafts` to lie to the local git that Linux kernel
> > >  history began at v2.6.14 version.
> 
> > Maybe also record this in .git/config, so that you can
> 
> I like that "config" thing less and less every day. It appears to become a
> kind of registry, where having dedicated files for specific functionality
> would provide the robustness of tools not having to touch things they do not
> care about; but that's just personal opinion.

It is becoming sort of a registry: it contains metadata about the current 
repository, easily available to scripts and programs.

I beg to differ on your personal opinion on the grounds that the 
robustness comes from testing, not from diversity. I much prefer to have a 
well tested config mechanism to having dozens of differently formatted 
files with less-than-well tested parsers.

Thank you for the insights in your personal opinion anyway.

> > - disallow fetching from this repo, and
> 
> Why? It's perfectly acceptable to pull from an incomplete repo, as long as you
> don't care about the old history.

Right. But should that be the default? I don't think so. Therefore: 
disable it, and if the user is absolutely sure to do dumb things, she'll 
have to enable it explicitely.

> > - easily extend the shallow copy to a larger shallow one, or a full one.
> 
> Hrm, I think there should also be a way to shrink a repo and "forget" old
> history occasionally (obviously, use of that feature would be highly
> discouraged).

Yes. And you need information about how shallow it used to be. My 
suggestion was to store that information at a place specific to that 
repository (see above).

> > > . Run `git fetch git://.../linux-2.6 master`, with a local ref
> > >  pointing at v2.6.14 commit, to pretend that we have everything
> > >  up to v2.6.14 to `upload-pack` running on the other end.
> 
> > How about refs/tags/start_shallow?
> 
> No, as that would imply that cloning from such a repo is disallowed.

See above.

> IMO, it may be a lot more robust to just have a list of "cutoff" object ids in
> .git/shallow instead of messing with grafts here, as adding or removing a line
> from that file is an easier thing to do for porcelain (or by hand) than
> rewriting the grafts file. Whether that list would be inclusive or exclusive
> would need to be decided still.

The functionality of cutoff objects is included in grafts functionality, 
so why should we spend time on reimplementing a subset of features?

IMHO, adding and removing lines from scripts is fragile.

I beg your pardon, you want to edit this information *by hand*? Wow.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] shallow clone
  2006-01-30 12:13     ` Johannes Schindelin
@ 2006-01-30 13:25       ` Simon Richter
  2006-01-30 19:25       ` Junio C Hamano
  1 sibling, 0 replies; 30+ messages in thread
From: Simon Richter @ 2006-01-30 13:25 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 3278 bytes --]

Hi,

Johannes Schindelin wrote:

[config as a registry]

> It is becoming sort of a registry: it contains metadata about the current 
> repository, easily available to scripts and programs.

Provided you have a parser that can handle it.

> I beg to differ on your personal opinion on the grounds that the 
> robustness comes from testing, not from diversity. I much prefer to have a 
> well tested config mechanism to having dozens of differently formatted 
> files with less-than-well tested parsers.

Indeed. But we already have a method for associating data values with 
keys in a hierarchical namespace, and that one is pretty well tested. :-)

>>Why? It's perfectly acceptable to pull from an incomplete repo, as long as you
>>don't care about the old history.

> Right. But should that be the default? I don't think so. Therefore: 
> disable it, and if the user is absolutely sure to do dumb things, she'll 
> have to enable it explicitely.

What harm is done if I have an incomplete repository? It would probably 
make more sense to emit a warning on clone and explain things if the 
user tries to go to a version she doesn't have.

>>Hrm, I think there should also be a way to shrink a repo and "forget" old
>>history occasionally (obviously, use of that feature would be highly
>>discouraged).

> Yes. And you need information about how shallow it used to be. My 
> suggestion was to store that information at a place specific to that 
> repository (see above).

Indeed, but you are keeping this information in two places, namely the 
grafts file and the config file. This is asking for trouble if they ever 
get out of sync.

>>>How about refs/tags/start_shallow?

>>No, as that would imply that cloning from such a repo is disallowed.

> See above.

Well, I can however see the use case of a developer hosting an 
incomplete repo on a free web service and another developer wanting to 
merge her changes into her (complete) repo. You would have to 
specialcase this tag in the fetch operation to avoid copying it over.

What's probably worse: You can only have a single cutoff point that way. 
You probably want multiple in case you want to cut off at a place where 
development happened in multiple branches that got subsequently merged 
inside the window of objects you keep.

> The functionality of cutoff objects is included in grafts functionality, 
> so why should we spend time on reimplementing a subset of features?

I would ask for the grafts parser to add "fake" grafts when it 
encounters the "shallow" file. Otherwise, it would be hard to 
distinguish between grafts the user made when doing interesting merges, 
and grafts that were created to build a shallow repo, because you would 
need some heuristics to figure out the latter from the former if you 
want to have a function in your porcelain to "pull more/all objects".

> I beg your pardon, you want to edit this information *by hand*? Wow.

Yes. That is actually the reason I like git so much: I can repair it by 
hand if something breaks, and this can be done with simple commands. I 
can remove an object id from a file with "grep -v" or perl. I would need 
to fire up an editor or hack a longer script if I wanted to fix 
something inside a complex file that does multiple things.

    Simon

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 307 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] shallow clone
  2006-01-30 12:13     ` Johannes Schindelin
  2006-01-30 13:25       ` Simon Richter
@ 2006-01-30 19:25       ` Junio C Hamano
  2006-01-31 11:28         ` Johannes Schindelin
  1 sibling, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2006-01-30 19:25 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> > - disallow fetching from this repo, and
>> 
>> Why? It's perfectly acceptable to pull from an incomplete
>> repo, as long as you don't care about the old history.
>
> Right. But should that be the default? I don't think so. Therefore: 
> disable it, and if the user is absolutely sure to do dumb things, she'll 
> have to enable it explicitely.

If the downstream person wants to have a shallow history of post
X.org X server core to further hack on it, I do not think of a
reason why we would want to refuse her from cloning a repository
of a fellow developer who has already done such a shallow copy.

If such a clone is done without telling the downstream that the
result is a shallow one, it is "dumb".  I would agree it should
not be done.  We need to propagate the grafts to the downstream
when a clone is done because of this.

By the way, please refrain from discussing .git/config vs
.git/eparate-config-files issue in this thread.  My personal
feeling so far is that the information current graft represents
is good enough to support shallow clones, and if not we can
extend its semantics to support such.  It can be discussed
independently if it is a good idea to move the final result
(grafts with updated semantics) to config file.  Even if we end
up not doing any of the shallow cloning support we have been
discussing, moving the information in .git/info/grafts to config
might make sense.  The issue is tangential.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] shallow clone
  2006-01-30 19:25       ` Junio C Hamano
@ 2006-01-31 11:28         ` Johannes Schindelin
  2006-01-31 13:05           ` Simon Richter
  0 siblings, 1 reply; 30+ messages in thread
From: Johannes Schindelin @ 2006-01-31 11:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi,

On Mon, 30 Jan 2006, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> >> > - disallow fetching from this repo, and
> >> 
> >> Why? It's perfectly acceptable to pull from an incomplete
> >> repo, as long as you don't care about the old history.
> >
> > Right. But should that be the default? I don't think so. Therefore: 
> > disable it, and if the user is absolutely sure to do dumb things, she'll 
> > have to enable it explicitely.
> 
> If the downstream person wants to have a shallow history of post
> X.org X server core to further hack on it, I do not think of a
> reason why we would want to refuse her from cloning a repository
> of a fellow developer who has already done such a shallow copy.

Okay. But in their case, they'll probably do what was done with Linux: 
start afresh. If you want to have the old history, you can import it and 
merge it via a graft.

> If such a clone is done without telling the downstream that the
> result is a shallow one, it is "dumb".  I would agree it should
> not be done.

That was my point. As long as you don't make sure the client handles the 
shallow upstream gracefully, it is dangerous. At the moment, there are too 
many code parts relying on the completeness of the repository (local and 
remote).

Since I wrote this, I realized that the problem I saw is not limited to 
shallow upstream, but there is a subtle issue with shallow downstreams, 
too:

Just imagine this: Alice starts a project, Bob makes a shallow copy from 
it when Alice just reverted an experimental feature. Then, Alice decides 
the experimental feature was not bad at all and reverts the revert. Bob 
pulls from Alice: Alice's upload-pack assumes Bob already has the original 
files (now re-reverted), and Bob ends up with a broken repository.

While writing the last paragraph, it became clear to me that the shallow 
thing is very fragile: IMHO it is impossible to be fully backwards 
compatible (remember: you should not force anybody to upgrade).

> By the way, please refrain from discussing .git/config vs 
> .git/eparate-config-files issue in this thread.

Okay. I will shut up on that issue.

> My personal feeling so far is that the information current graft 
> represents is good enough to support shallow clones, and if not we can 
> extend its semantics to support such.

No. The grafts are more powerful. I have quite a few repos here in which I 
heavily work with grafts, and they are no cutoffs for shallow repos. They 
are hard links between different lines of development. For example, I use 
them to map merges in cvsimported projects, thus fixing a shortcoming of 
CVS. Also, you can "add" history.

If you now rely on the grafts file to determine what was a cutoff, you may 
well end up with bogus cutoffs.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] shallow clone
  2006-01-31 11:28         ` Johannes Schindelin
@ 2006-01-31 13:05           ` Simon Richter
  2006-01-31 13:31             ` Johannes Schindelin
  0 siblings, 1 reply; 30+ messages in thread
From: Simon Richter @ 2006-01-31 13:05 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 2733 bytes --]

Hi,

Johannes Schindelin wrote:

>>If the downstream person wants to have a shallow history of post
>>X.org X server core to further hack on it, I do not think of a
>>reason why we would want to refuse her from cloning a repository
>>of a fellow developer who has already done such a shallow copy.

> Okay. But in their case, they'll probably do what was done with Linux: 
> start afresh. If you want to have the old history, you can import it and 
> merge it via a graft.

Well, in the Linux case the problem was not knowing what the SHA1 sum of 
the entire Linux history was. In the shallow repo case we know it, so 
there is no point in throwing away that information.

>>If such a clone is done without telling the downstream that the
>>result is a shallow one, it is "dumb".  I would agree it should
>>not be done.

> That was my point. As long as you don't make sure the client handles the 
> shallow upstream gracefully, it is dangerous. At the moment, there are too 
> many code parts relying on the completeness of the repository (local and 
> remote).

Well, the important thing would be that commands that can work (a merge 
only needs to find the most recent common ancestor, etc) do work, and 
commands that cannot ("log") emit sensible diagnostics.

> Just imagine this: Alice starts a project, Bob makes a shallow copy from 
> it when Alice just reverted an experimental feature. Then, Alice decides 
> the experimental feature was not bad at all and reverts the revert. Bob 
> pulls from Alice: Alice's upload-pack assumes Bob already has the original 
> files (now re-reverted), and Bob ends up with a broken repository.

I know far too little about the internal workings for that, but I'd 
assume that in this case Bob's copy starts at the commit that was never 
in question (and he never saw the reverted commit), and Alice's contains 
a commit on top of that. That one should work. But the other way 'round 
is problematic, when Bob starts with a commit that has been reverted in 
Alice's repository. The solution is for Bob to ask Alice's repo for the 
common ancestor of his shallow base and Alice's HEAD. Alice's repo can, 
however, fail to deliver these if there has been a purge since, in that 
case, stuff needs to be merged by hand (but you already have a problem 
if someone clones your repo before you revert changes, so no regression 
here).

> If you now rely on the grafts file to determine what was a cutoff, you may 
> well end up with bogus cutoffs.

Exactly that was my concern earlier; my database design gut feeling 
tells me that information duplication is not good either, hence my 
suggestion to split off these grafts into a separate file in order to 
mark them as cutoff points.

    Simon

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 307 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] shallow clone
  2006-01-31 13:05           ` Simon Richter
@ 2006-01-31 13:31             ` Johannes Schindelin
  2006-01-31 14:23               ` Simon Richter
  0 siblings, 1 reply; 30+ messages in thread
From: Johannes Schindelin @ 2006-01-31 13:31 UTC (permalink / raw)
  To: Simon Richter; +Cc: Junio C Hamano, git

Hi,

On Tue, 31 Jan 2006, Simon Richter wrote:

> Well, the important thing would be that commands that can work (a merge only
> needs to find the most recent common ancestor, etc) do work, and commands that
> cannot ("log") emit sensible diagnostics.

No it would not.

A commit is a very small object which points (among others) to a tree 
object.

A tree object corresponds to a directory (that is, it can point to a 
number of tree and blob objects).

A blob object corresponds to a file (that is, git never parses its 
contents).

If two separate revisions contain the same file (i.e. same contents), this 
is not duplicated, but the corresponding tree objects point to the same 
object.

If you pull, upload-pack will think you have *every* object depending on 
every ref you have stored.

Say you have three revisions, A -> B -> C, and A and C contain the 
same file bla.txt, and the client says it has B, the upstream upload-pack 
assumes you have bla.txt.

> I know far too little about the internal workings for that, [...]

I hope I clarified the important aspect.

> > If you now rely on the grafts file to determine what was a cutoff, you may
> > well end up with bogus cutoffs.
> 
> Exactly that was my concern earlier; my database design gut feeling tells me
> that information duplication is not good either, [...]

You only have two choices: you proposed code duplication, and yours truly 
proposed data duplication.

As is known from good database design: a few redundancies here and there 
are typically needed for good performance.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] shallow clone
  2006-01-31 13:31             ` Johannes Schindelin
@ 2006-01-31 14:23               ` Simon Richter
  0 siblings, 0 replies; 30+ messages in thread
From: Simon Richter @ 2006-01-31 14:23 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 607 bytes --]

Hi,

Johannes Schindelin wrote:

> If you pull, upload-pack will think you have *every* object depending on 
> every ref you have stored.

Ah, okay. That was the missing information, thanks.

> You only have two choices: you proposed code duplication, and yours truly 
> proposed data duplication.

Erm, if there are multiple places for parsing a grafts file, that needs 
to be addressed as well.

> As is known from good database design: a few redundancies here and there 
> are typically needed for good performance.

Sure, but only if you can "rebuild" all the redundant information reliably.

    Simon

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 307 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] shallow clone
  2006-01-30 11:58   ` Simon Richter
  2006-01-30 12:13     ` Johannes Schindelin
@ 2006-01-30 19:25     ` Junio C Hamano
  2006-01-31  8:37       ` Franck
  1 sibling, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2006-01-30 19:25 UTC (permalink / raw)
  To: Simon Richter; +Cc: git

Simon Richter <Simon.Richter@hogyros.de> writes:

>> - disallow fetching from this repo, and
>
> Why? It's perfectly acceptable to pull from an incomplete repo, as
> long as you don't care about the old history.

I agree.  As long as the cloned one can record itself as a
shallow one (and with what epochs), I do not see a reason to
forbid second generation clone from a shallow repository.

> Hrm, I think there should also be a way to shrink a repo and "forget"
> old history occasionally (obviously, use of that feature would be
> highly discouraged).

I do not think of a reason to discourage it, and I think you can
do the "forgetting" part with the current set of tools.  Choose
appropriate cauterizing points, set up info/grafts and running
"repack -a -d" would be sufficient.

> IMO, it may be a lot more robust to just have a list of "cutoff"
> object ids in .git/shallow instead of messing with grafts here, as
> adding or removing a line from that file is an easier thing to do for
> porcelain (or by hand) than rewriting the grafts file. Whether that
> list would be inclusive or exclusive would need to be decided still.

I would rather not to have .git/shallow nor .git/shallow_start.

Cauterizing is not any more special than other grafts entries.
If you have grafted historical kernel repository behind the
official kernel repository with 2.6.12-rc2 epoch, I do not think
of any reason to forbid people from cloning such with the
grafts.  

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] shallow clone
  2006-01-30 19:25     ` Junio C Hamano
@ 2006-01-31  8:37       ` Franck
  2006-01-31  8:51         ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Franck @ 2006-01-31  8:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Simon Richter, git

2006/1/30, Junio C Hamano <junkio@cox.net>:
> Simon Richter <Simon.Richter@hogyros.de> writes:
>
> >> - disallow fetching from this repo, and
> >
> > Why? It's perfectly acceptable to pull from an incomplete repo, as
> > long as you don't care about the old history.
>
> I agree.  As long as the cloned one can record itself as a
> shallow one (and with what epochs), I do not see a reason to
> forbid second generation clone from a shallow repository.
>

I agree too

> Cauterizing is not any more special than other grafts entries.
> If you have grafted historical kernel repository behind the
> official kernel repository with 2.6.12-rc2 epoch, I do not think
> of any reason to forbid people from cloning such with the
> grafts.
>

I built my public repository from a cautorized one and everybody who
is pulling from mine is aware of the lack of the full history but they
actually don't care. If someone is pulling from my repo, he actually
wants to work on my project which do not need any old thing...

Thanks
--
               Franck

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] shallow clone
  2006-01-31  8:37       ` Franck
@ 2006-01-31  8:51         ` Junio C Hamano
  2006-01-31 11:11           ` Franck
  0 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2006-01-31  8:51 UTC (permalink / raw)
  To: Franck; +Cc: git

Franck <vagabon.xyz@gmail.com> writes:

> I built my public repository from a cautorized one and everybody who
> is pulling from mine is aware of the lack of the full history but they
> actually don't care. If someone is pulling from my repo, he actually
> wants to work on my project which do not need any old thing...

Mind writing up a howto on the topic?

 - How things are set up using the current tool.
 - How others initially clone from you.
 - How others update (pull) from you.
 - What are the pitfalls you and others need to avoid
   (i.e. operations that involve old history)

I brought this up, because lack of official support of shallow
cloning was cited as one of the showstopper for a project that
once considered switching to git but didn't, from a mailing list
research.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] shallow clone
  2006-01-31  8:51         ` Junio C Hamano
@ 2006-01-31 11:11           ` Franck
  0 siblings, 0 replies; 30+ messages in thread
From: Franck @ 2006-01-31 11:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

2006/1/31, Junio C Hamano <junkio@cox.net>:
> Franck <vagabon.xyz@gmail.com> writes:
>
> > I built my public repository from a cautorized one and everybody who
> > is pulling from mine is aware of the lack of the full history but they
> > actually don't care. If someone is pulling from my repo, he actually
> > wants to work on my project which do not need any old thing...
>
> Mind writing up a howto on the topic?

ok I'll try to sum-up something this week, hope my bad english will be
understandable...

>
>  - How things are set up using the current tool.
>  - How others initially clone from you.
>  - How others update (pull) from you.
>  - What are the pitfalls you and others need to avoid
>    (i.e. operations that involve old history)

actually I just discovered one thanks to your first email for this
thread about reverted commit...So I'm not very the one for this
section...

>
> I brought this up, because lack of official support of shallow
> cloning was cited as one of the showstopper for a project that
> once considered switching to git but didn't, from a mailing list
> research.

again I wasn't aware that this feature is really needed...

thanks
--
               Franck

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] shallow clone
  2006-01-30 11:39 ` Johannes Schindelin
  2006-01-30 11:58   ` Simon Richter
@ 2006-01-30 18:46   ` Junio C Hamano
  2006-01-31 11:02     ` [PATCH] Shallow clone: low level machinery Junio C Hamano
                       ` (2 more replies)
  1 sibling, 3 replies; 30+ messages in thread
From: Junio C Hamano @ 2006-01-30 18:46 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> . Download the full tree for v2.6.14 commit and store its
>>   objects locally.
>
> On first read, I mistook "tree" for "commit"...

It turns out that this 'single' request step is unneeded, as
long as we implement 'graft' requests.  We can then tell
"Cauterize at v2.6.14 and give me the master" to `upload-pack`.
`upload-pack` would run `rev-list --objects master`, tries to include
everything that is reachable from "master", but notices that the
v2.6.14 commit does not have any parent (thanks to the
customized graft) and stops there -- the result is the history
since v2.6.14.

>> . Set up `info/grafts` to lie to the local git that Linux kernel
>>   history began at v2.6.14 version.
>
> Maybe also record this in .git/config, so that you can
>
> - disallow fetching from this repo, and
> - easily extend the shallow copy to a larger shallow one, or a full one.

I thought about that before I wrote the message, but it boils
down to grepping lines from grafts that have only one object
name (i.e. cauterizing records), so it is redundant.

Also there is no strict reason to forbid cloning from such a
shallow repository.  No harm is done as long as you make it
clear to somebody who clones from you that what you have is a
shallow copy, so that the cloned repository can cauterize
history at appropriate places.

A second generation clone, when cloning from a shallow
repository, needs to mark itself that it has the same or
shallower history (otherwise a third generation clone from it
would not work), so the `upload-pack` protocol needs to be
updated to send grafts information the `upload-pack` side
usually uses to the downloader even when 'graft' request is not
used by the downloader.  But once it is done, you should be able
to clone safely from a shallow repository and end up with a
repository with the same (or shallower -- if you asked to make a
shallow clone from it) history.

> Why not just start another fetch? Then, "have <refs/tags/start_shallow>" 
> would be sent, and upload-pack does the right thing?

Yes, almost.  We need to realize that `upload-pack` that hears
"have A, want B" is allowed to omit objects that appear in
`ls-tree B` output but not in `ls-tree A`.  "have A" means not
just "I have A", but "I have A and all of its ancestors", so
just sending "have start_shallow" (or start_shallow^ for that
matter) is not quite enough [*1*].

> If you absolutely want to get only one pack, which then is stored as-is, 
> upload-pack could start two rev-list processes: one for the tree and one 
> for all the rest.

The message you are responding did two separate transfers (one
'single', and another 'fetch'); I do not particularly mind doing
two (it is just an initial clone anyway), but as I said it turns
out that we do not need the initial 'single'.

>> [NOTE]
>> Most likely this is not directly run by the user but is run as
>> the first command invoked by the shallow clone script.
>
> Better make it an option to git-clone

Probably -- I was just outlining the lowest-level mechanism and
haven't thought much about the UI.

[Footnote]

*1* This is true even without more aggressive optimization by
rev-list that does not exist there yet.  Here is a minimalistic
demonstration.  One file project with a handful straight-line
commits.  Each change to the file reverts the change made by the
previous commit.

 * The HEAD commit has "white", the HEAD~1 "black" and HEAD~2
   "white".

 * We say we are interested in things since HEAD~2 (i.e. we
   pretend that the history starts at HEAD~1 and it does not
   have a parent) and ask for HEAD.

 * Notice that only one copy of the file appears in the output.
   It is "black" blob.  We do not get "white" blob because we
   are telling it that we _have_ HEAD~2.  The resulting set of
   objects is not enough to check-out the HEAD commit.

This roughly corresponds to your "have shallow_start", but not
quite -- in that sequence you have objects for HEAD~2 commit.
But the point is that I want to leave the door open for
optimizing upload-pack, so that it can choose to omit objects
that do not appear in A when you say "have A", if the object
appears in one of A's ancestors.

-- >8 --
#!/bin/sh

rm -fr .git

git init-db
zebra=white
echo $zebra >file
git add file
git commit -m initial

for i in 0 1 2 3 4 5
do
	case $zebra in
	white) zebra=black ;;
	black) zebra=white ;;
	esac
	echo $zebra >file
	git commit -a -m "$i $zebra"
done
git rev-list --objects HEAD~2..HEAD |
git name-rev --stdin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] Shallow clone: low level machinery.
  2006-01-30 18:46   ` Junio C Hamano
@ 2006-01-31 11:02     ` Junio C Hamano
  2006-01-31 13:58       ` Johannes Schindelin
  2006-01-31 14:20     ` [RFC] shallow clone Johannes Schindelin
  2006-01-31 20:59     ` Junio C Hamano
  2 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2006-01-31 11:02 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

This adds --shallow=refname option to git-clone-pack, and
extends upload-pack protocol with "shallow" extension.

An example:

	$ mkdir junk && cd junk && git init-db
	$ git clone-pack --shallow=refs/heads/master ../git.git master

This creates a very shallow clone of my repository.  It says
"pretend refs/heads/master commit is the beginning of time, and
clone your master branch".  As before, clone-pack with explicit
head name outputs the commit object name and refname to the
standard output instead of creating the branch.  The command
creates a .git/info/grafts file to cauterize the history at that
commit as well.

I think upload-pack side is more or less ready to be debugged,
but the client side is highly experimental.  It has quite
serious limitations and is more of a proof of correctness at the
protocol extension level than for practical use:

 - Currently it can take only one ---shallow option.

 - It has to be spelled in full (refs/heads/master, not
   "master").

 - It has to be included as part of explicit refname list.

 - There is no matching --shallow in git-fetch-pack.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 cache.h       |    9 +++
 clone-pack.c  |   69 ++++++++++++++++++++++-
 commit-tree.c |    5 --
 commit.c      |  174 +++++++++++++++++++++++++++++++++++++++------------------
 commit.h      |   14 +++++
 connect.c     |   24 ++++++++
 object.c      |    7 ++
 object.h      |    2 +
 upload-pack.c |   94 +++++++++++++++++++++++++++++--
 9 files changed, 331 insertions(+), 67 deletions(-)

75f1f4871277f403991c771eb642bdbd6fe82021
diff --git a/cache.h b/cache.h
index bdbe2d6..18d4cdb 100644
--- a/cache.h
+++ b/cache.h
@@ -111,11 +111,18 @@ static inline unsigned int create_ce_mod
 extern struct cache_entry **active_cache;
 extern unsigned int active_nr, active_alloc, active_cache_changed;
 
+/*
+ * Having more than two parents is not strange at all, and this is
+ * how multi-way merges are represented.
+ */
+#define MAXPARENT (16)
+
 #define GIT_DIR_ENVIRONMENT "GIT_DIR"
 #define DEFAULT_GIT_DIR_ENVIRONMENT ".git"
 #define DB_ENVIRONMENT "GIT_OBJECT_DIRECTORY"
 #define INDEX_ENVIRONMENT "GIT_INDEX_FILE"
 #define GRAFT_ENVIRONMENT "GIT_GRAFT_FILE"
+#define GRAFT_INFO_ENVIRONMENT "GIT_GRAFT_INFO"
 
 extern char *get_git_dir(void);
 extern char *get_object_directory(void);
@@ -296,6 +303,8 @@ struct ref {
 	char name[FLEX_ARRAY]; /* more */
 };
 
+extern void send_graft_info(int);
+
 extern int git_connect(int fd[2], char *url, const char *prog);
 extern int finish_connect(pid_t pid);
 extern int path_match(const char *path, int nr, char **match);
diff --git a/clone-pack.c b/clone-pack.c
index f634431..c1708d5 100644
--- a/clone-pack.c
+++ b/clone-pack.c
@@ -1,15 +1,76 @@
 #include "cache.h"
 #include "refs.h"
 #include "pkt-line.h"
+#include "commit.h"
 
 static const char clone_pack_usage[] =
-"git-clone-pack [--exec=<git-upload-pack>] [<host>:]<directory> [<heads>]*";
+"git-clone-pack [--shallow=name] [--exec=<git-upload-pack>] [<host>:]<directory> [<heads>]*";
 static const char *exec = "git-upload-pack";
+static char *shallow = NULL;
+
+static void shallow_exchange(int fd[2], struct ref *ref)
+{
+	char line[1024];
+	char *graft_file;
+	FILE *fp;
+	int i, j;
+
+	while (ref) {
+		if (!strcmp(ref->name, shallow))
+			break;
+		ref = ref->next;
+	}
+	if (!ref)
+		die("No matching ref specified for shallow clone %s",
+		    shallow);
+	if (!server_supports("shallow"))
+		die("The other end does not support shallow clone");
+	packet_write(fd[1], "shallow\n");
+	packet_flush(fd[1]);
+
+	/* Read their graft */
+	prepare_commit_graft();
+	for (;;) {
+		int len;
+		len = packet_read_line(fd[0], line, sizeof(line));
+		if (!len)
+			break;
+		add_graft_info(line);
+	}
+	/* And cauterize at --shallow=<sha1> */
+	sprintf(line, "%s\n", sha1_to_hex(ref->old_sha1));
+	add_graft_info(line);
+
+	/* tell ours */
+	packet_write(fd[1], "custom\n");
+	send_graft_info(fd[1]);
+	packet_flush(fd[1]);
+
+	/* write out ours */
+	graft_file = get_graft_file();
+	fp = fopen(graft_file, "w");
+	if (!fp)
+		die("cannot update grafts!");
+
+	for (i = 0; i < commit_graft_nr; i++) {
+		struct commit_graft *g = commit_graft[i];
+		fputs(sha1_to_hex(g->sha1), fp);
+		for (j = 0; j < g->nr_parent; j++) {
+			fputc(' ', fp);
+			fputs(sha1_to_hex(g->parent[j]), fp);
+		}
+		fputc('\n', fp);
+	}
+	fclose(fp);
+}
 
 static void clone_handshake(int fd[2], struct ref *ref)
 {
 	unsigned char sha1[20];
 
+	if (shallow)
+		shallow_exchange(fd, ref);
+
 	while (ref) {
 		packet_write(fd[1], "want %s\n", sha1_to_hex(ref->old_sha1));
 		ref = ref->next;
@@ -160,6 +221,10 @@ int main(int argc, char **argv)
 				exec = arg + 7;
 				continue;
 			}
+			if (!strncmp("--shallow=", arg, 10)) {
+				shallow = arg + 10;
+				continue;
+			}
 			usage(clone_pack_usage);
 		}
 		dest = arg;
@@ -167,6 +232,8 @@ int main(int argc, char **argv)
 		nr_heads = argc - i - 1;
 		break;
 	}
+	if (shallow && !nr_heads)
+		die("shallow clone needs an explicit head name");
 	if (!dest)
 		usage(clone_pack_usage);
 	pid = git_connect(fd, dest, exec);
diff --git a/commit-tree.c b/commit-tree.c
index 4634b50..cbf2979 100644
--- a/commit-tree.c
+++ b/commit-tree.c
@@ -53,11 +53,6 @@ static void check_valid(unsigned char *s
 	free(buf);
 }
 
-/*
- * Having more than two parents is not strange at all, and this is
- * how multi-way merges are represented.
- */
-#define MAXPARENT (16)
 static unsigned char parent_sha1[MAXPARENT][20];
 
 static const char commit_tree_usage[] = "git-commit-tree <sha1> [-p <sha1>]* < changelog";
diff --git a/commit.c b/commit.c
index 97205bf..a862287 100644
--- a/commit.c
+++ b/commit.c
@@ -102,12 +102,8 @@ static unsigned long parse_commit_date(c
 	return date;
 }
 
-static struct commit_graft {
-	unsigned char sha1[20];
-	int nr_parent;
-	unsigned char parent[0][20]; /* more */
-} **commit_graft;
-static int commit_graft_alloc, commit_graft_nr;
+struct commit_graft **commit_graft;
+int commit_graft_alloc, commit_graft_nr;
 
 static int commit_graft_pos(const unsigned char *sha1)
 {
@@ -128,62 +124,104 @@ static int commit_graft_pos(const unsign
 	return -lo - 1;
 }
 
-static void prepare_commit_graft(void)
+int add_graft_info(char *buf)
 {
-	char *graft_file = get_graft_file();
-	FILE *fp = fopen(graft_file, "r");
+	/* The format is just "Commit Parent1 Parent2 ...\n" */
+	int len = strlen(buf);
+	int i;
+	struct commit_graft *graft = NULL;
+
+	if (buf[len-1] == '\n')
+		buf[--len] = 0;
+	if (buf[0] == '#')
+		return 0;
+	if ((len + 1) % 41) {
+	bad_graft_data:
+		error("bad graft data: %s", buf);
+		free(graft);
+		return -1;
+	}
+	i = (len + 1) / 41 - 1;
+	graft = xmalloc(sizeof(*graft) + 20 * i);
+	graft->nr_parent = i;
+	if (get_sha1_hex(buf, graft->sha1))
+		goto bad_graft_data;
+	for (i = 40; i < len; i += 41) {
+		if (buf[i] != ' ')
+			goto bad_graft_data;
+		if (get_sha1_hex(buf + i + 1, graft->parent[i/41]))
+			goto bad_graft_data;
+	}
+	i = commit_graft_pos(graft->sha1);
+	if (0 <= i) {
+		free(commit_graft[i]);
+		commit_graft[i] = graft;
+		return 0;
+	}
+	i = -i - 1;
+	if (commit_graft_alloc <= ++commit_graft_nr) {
+		commit_graft_alloc = alloc_nr(commit_graft_alloc);
+		commit_graft = xrealloc(commit_graft,
+					sizeof(*commit_graft) *
+					commit_graft_alloc);
+	}
+	if (i < commit_graft_nr)
+		memmove(commit_graft + i + 1,
+			commit_graft + i,
+			(commit_graft_nr - i - 1) *
+			sizeof(*commit_graft));
+	commit_graft[i] = graft;
+	return 0;
+}
+
+void clear_commit_graft(void)
+{
+	int i;
+	for (i = 0; i < commit_graft_nr; i++)
+		free(commit_graft[i]);
+	free(commit_graft);
+	commit_graft_nr = commit_graft_alloc = 0;
+	commit_graft = NULL;
+}
+
+void prepare_commit_graft(void)
+{
+	char *graft_file;
+	FILE *fp;
 	char buf[1024];
+
+	if (getenv(GRAFT_INFO_ENVIRONMENT)) {
+		char *cp, *ep;
+		for (cp = getenv(GRAFT_INFO_ENVIRONMENT);
+		     *cp;
+		     cp = ep) {
+			int more = 0;
+			ep = strchr(cp, '\n');
+			if (ep) {
+				more = 1;
+				*ep = '\0';
+			}
+			else {
+				ep = cp + strlen(cp);
+			}
+			if (ep != cp)
+				add_graft_info(cp);
+			if (!more)
+				break;
+			*ep = '\n';
+			ep++;
+		}
+		return;
+	}
+	graft_file = get_graft_file();
+	fp = fopen(graft_file, "r");
 	if (!fp) {
-		commit_graft = (struct commit_graft **) "hack";
+		commit_graft = (struct commit_graft **) xmalloc(1);
 		return;
 	}
-	while (fgets(buf, sizeof(buf), fp)) {
-		/* The format is just "Commit Parent1 Parent2 ...\n" */
-		int len = strlen(buf);
-		int i;
-		struct commit_graft *graft = NULL;
+	while (fgets(buf, sizeof(buf), fp))
+		add_graft_info(buf);
 
-		if (buf[len-1] == '\n')
-			buf[--len] = 0;
-		if (buf[0] == '#')
-			continue;
-		if ((len + 1) % 41) {
-		bad_graft_data:
-			error("bad graft data: %s", buf);
-			free(graft);
-			continue;
-		}
-		i = (len + 1) / 41 - 1;
-		graft = xmalloc(sizeof(*graft) + 20 * i);
-		graft->nr_parent = i;
-		if (get_sha1_hex(buf, graft->sha1))
-			goto bad_graft_data;
-		for (i = 40; i < len; i += 41) {
-			if (buf[i] != ' ')
-				goto bad_graft_data;
-			if (get_sha1_hex(buf + i + 1, graft->parent[i/41]))
-				goto bad_graft_data;
-		}
-		i = commit_graft_pos(graft->sha1);
-		if (0 <= i) {
-			error("duplicate graft data: %s", buf);
-			free(graft);
-			continue;
-		}
-		i = -i - 1;
-		if (commit_graft_alloc <= ++commit_graft_nr) {
-			commit_graft_alloc = alloc_nr(commit_graft_alloc);
-			commit_graft = xrealloc(commit_graft,
-						sizeof(*commit_graft) *
-						commit_graft_alloc);
-		}
-		if (i < commit_graft_nr)
-			memmove(commit_graft + i + 1,
-				commit_graft + i,
-				(commit_graft_nr - i - 1) *
-				sizeof(*commit_graft));
-		commit_graft[i] = graft;
-	}
 	fclose(fp);
 }
 
@@ -288,6 +326,30 @@ int parse_commit(struct commit *item)
 	return ret;
 }
 
+static void reparse_commit_parents(struct object *o)
+{
+	struct commit *c;
+	struct commit_list *parents;
+	if ((o->type != commit_type) || !o->parsed)
+		return;
+	c = (struct commit *)o;
+	parents = c->parents;
+	o->parsed = 0;
+	while (parents) {
+		struct commit_list *next = parents->next;
+		free(parents);
+		parents = next;
+	}
+	c->parents = NULL;
+	free(c->buffer);
+	c->buffer = NULL;
+}
+
+void reparse_all_parsed_commits(void)
+{
+	for_each_object(reparse_commit_parents);
+}
+
 struct commit_list *commit_list_insert(struct commit *item, struct commit_list **list_p)
 {
 	struct commit_list *new_list = xmalloc(sizeof(struct commit_list));
diff --git a/commit.h b/commit.h
index 986b22d..abc5b9e 100644
--- a/commit.h
+++ b/commit.h
@@ -17,6 +17,20 @@ struct commit {
 	char *buffer;
 };
 
+struct commit_graft {
+	unsigned char sha1[20];
+	int nr_parent;
+	unsigned char parent[0][20]; /* more */
+};
+
+extern struct commit_graft **commit_graft;
+extern int commit_graft_alloc, commit_graft_nr;
+
+extern void prepare_commit_graft(void);
+extern void clear_commit_graft(void);
+extern int add_graft_info(char *);
+extern void reparse_all_parsed_commits(void);
+
 extern int save_commit_buffer;
 extern const char *commit_type;
 
diff --git a/connect.c b/connect.c
index 3f2d65c..046d1da 100644
--- a/connect.c
+++ b/connect.c
@@ -3,6 +3,7 @@
 #include "pkt-line.h"
 #include "quote.h"
 #include "refs.h"
+#include "commit.h"
 #include <sys/wait.h>
 #include <sys/socket.h>
 #include <netinet/in.h>
@@ -298,6 +299,29 @@ int match_refs(struct ref *src, struct r
 	return 0;
 }
 
+void send_graft_info(int outfd)
+{
+	int i, j;
+	char packet_buf[41*MAXPARENT], *buf;
+
+	for (i = 0; i < commit_graft_nr; i++) {
+		struct commit_graft *g = commit_graft[i];
+		buf = packet_buf;
+		memcpy(buf, sha1_to_hex(g->sha1), 40);
+		buf += 40;
+		if (MAXPARENT <= g->nr_parent)
+			die("insanely big octopus graft with %d parents: %s",
+			    g->nr_parent, sha1_to_hex(g->sha1));
+		for (j = 0; j < g->nr_parent; j++) {
+			*buf++ = ' ';
+			memcpy(buf, sha1_to_hex(g->parent[j]), 40);
+			buf += 40;
+		}
+		*buf = 0;
+		packet_write(outfd, "%s\n", packet_buf);
+	}
+}
+
 enum protocol {
 	PROTO_LOCAL = 1,
 	PROTO_SSH,
diff --git a/object.c b/object.c
index 1577f74..bbcfcd8 100644
--- a/object.c
+++ b/object.c
@@ -252,3 +252,10 @@ int object_list_contains(struct object_l
 	}
 	return 0;
 }
+
+void for_each_object(void (*fn)(struct object *))
+{
+	int i;
+	for (i = 0; i < nr_objs; i++)
+		fn(objs[i]);
+}
diff --git a/object.h b/object.h
index 0e76182..b4c9729 100644
--- a/object.h
+++ b/object.h
@@ -55,4 +55,6 @@ unsigned object_list_length(struct objec
 
 int object_list_contains(struct object_list *list, struct object *obj);
 
+void for_each_object(void (*)(struct object *));
+
 #endif /* OBJECT_H */
diff --git a/upload-pack.c b/upload-pack.c
index d198055..90ea549 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -13,11 +13,16 @@ static const char upload_pack_usage[] = 
 #define WANTED (1U << 2)
 #define MAX_HAS 256
 #define MAX_NEEDS 256
-static int nr_has = 0, nr_needs = 0, multi_ack = 0, nr_our_refs = 0;
+#define MAX_PARENTS 20
+static int nr_has = 0, nr_needs = 0, nr_our_refs = 0;
 static unsigned char has_sha1[MAX_HAS][20];
 static unsigned char needs_sha1[MAX_NEEDS][20];
 static unsigned int timeout = 0;
 
+/* protocol extensions */
+static int multi_ack = 0;
+static int using_custom_graft = 0;
+
 static void reset_timeout(void)
 {
 	alarm(timeout);
@@ -163,6 +168,77 @@ static int get_common_commits(void)
 	}
 }
 
+static void exchange_grafts(void)
+{
+	int len;
+	char line[41*MAX_PARENTS];
+
+	/* We heard "shallow"; drop up to the next flush */
+	for (;;) {
+		len = packet_read_line(0, line, sizeof(line));
+		reset_timeout();
+		if (!len)
+			break;
+	}
+
+	/* Send our graft */
+	prepare_commit_graft();
+	send_graft_info(1);
+	packet_flush(1);
+
+	/* For precise common commits discovery, we need to use
+	 * the graft information we received from them.
+	 * But this is expensive, so the downloader first says
+	 * if it wants to use our graft as is.
+	 */
+	len = packet_read_line(0, line, sizeof(line));
+	reset_timeout();
+	if (!len)
+		; /* use ours as is */
+	else if (!strcmp(line, "custom\n")) {
+		using_custom_graft = 1;
+		clear_commit_graft();
+		for (;;) {
+			len = packet_read_line(0, line, sizeof(line));
+			reset_timeout();
+			if (!len)
+				break;
+			if (add_graft_info(line))
+				die("Bad graft line %s", line);
+		}
+		/* And using that, we prepare our end. */
+		reparse_all_parsed_commits();
+	}
+	else
+		die("expected 'custom', got '%s'", line);
+}
+
+static void setup_custom_graft(void)
+{
+	char *graft_env = strdup(GRAFT_INFO_ENVIRONMENT "=");
+	int envlen = strlen(graft_env);
+	int i, j;
+
+	for (i = 0; i < commit_graft_nr; i++) {
+		struct commit_graft *g = commit_graft[i];
+		char buf[41*MAX_PARENTS], *ptr;
+		ptr = buf;
+		memcpy(ptr, sha1_to_hex(g->sha1), 40);
+		ptr += 40;
+		for (j = 0; j < g->nr_parent; j++) {
+			*ptr++ = ' ';
+			memcpy(ptr, sha1_to_hex(g->parent[j]), 40);
+			ptr += 40;
+		}
+		*ptr++ = '\n';
+		*ptr = 0;
+		graft_env = xrealloc(graft_env, envlen + (ptr - buf));
+		memcpy(graft_env + envlen, buf, ptr - buf + 1);
+		envlen += ptr - buf;
+	}
+	putenv(graft_env);
+}
+
 static int receive_needs(void)
 {
 	static char line[1000];
@@ -180,16 +256,22 @@ static int receive_needs(void)
 		sha1_buf = dummy;
 		if (needs == MAX_NEEDS) {
 			fprintf(stderr,
-				"warning: supporting only a max of %d requests. "
+				"warning: supporting only a max of "
+				"%d requests. "
 				"sending everything instead.\n",
 				MAX_NEEDS);
 		}
 		else if (needs < MAX_NEEDS)
 			sha1_buf = needs_sha1[needs];
 
-		if (strncmp("want ", line, 5) || get_sha1_hex(line+5, sha1_buf))
+		if (!strcmp("shallow\n", line)) {
+			exchange_grafts();
+			continue;
+		}
+		if (strncmp("want ", line, 5) ||
+		    get_sha1_hex(line+5, sha1_buf))
 			die("git-upload-pack: protocol error, "
-			    "expected to get sha, not '%s'", line);
+			    "expected to get want-sha1, not '%s'", line);
 		if (strstr(line+45, "multi_ack"))
 			multi_ack = 1;
 
@@ -213,7 +295,7 @@ static int receive_needs(void)
 
 static int send_ref(const char *refname, const unsigned char *sha1)
 {
-	static char *capabilities = "multi_ack";
+	static char *capabilities = "multi_ack shallow";
 	struct object *o = parse_object(sha1);
 
 	if (capabilities)
@@ -243,6 +325,8 @@ static int upload_pack(void)
 	if (!nr_needs)
 		return 0;
 	get_common_commits();
+	if (using_custom_graft)
+		setup_custom_graft();
 	create_pack_file();
 	return 0;
 }
-- 
1.1.6.gefef

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] Shallow clone: low level machinery.
  2006-01-31 11:02     ` [PATCH] Shallow clone: low level machinery Junio C Hamano
@ 2006-01-31 13:58       ` Johannes Schindelin
  2006-01-31 17:49         ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Johannes Schindelin @ 2006-01-31 13:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi,

apart from my thinking this is not backward-compatible (you are supposed 
to be able to pull from a complete repo, even if it has a 
non-shallow-capable upload-pack), here are my comments:

- it is good that MAXPARENT and struct commit_graft are in more public 
	places now.

- reparse_* is misleading. Nothing is reparsed, but rather "unparsed".

- I'd hesitate to let git-daemon write temporary files. That is a whole 
	new can of security worms.

- It looks wrong to me to define MAX_PARENTS as 20 in upload-pack.c, when 
	MAXPARENT is defined as 16 in cache.h.

- The custom_graft issue could be handled in a more elegant manner if 
	git was lib'ified (no temporary file). Since that is already the 
	plan, why not do that first, and come back later?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Shallow clone: low level machinery.
  2006-01-31 13:58       ` Johannes Schindelin
@ 2006-01-31 17:49         ` Junio C Hamano
  2006-01-31 18:06           ` Johannes Schindelin
  0 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2006-01-31 17:49 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> apart from my thinking this is not backward-compatible (you are supposed 
> to be able to pull from a complete repo, even if it has a 
> non-shallow-capable upload-pack), here are my comments:

It cannot do a shallow clone against older servers, no.  I think
it should be able to do a full clone from older servers, but I
need to double check -- at least that is how I meant to write
that thing but it was late night ;-).

> - it is good that MAXPARENT and struct commit_graft are in more public 
> 	places now.
>
> - reparse_* is misleading. Nothing is reparsed, but rather "unparsed".

I meant to reparse them thear but forgot.  Will remember to fix.

> - I'd hesitate to let git-daemon write temporary files. That is a whole 
> 	new can of security worms.
>
> - The custom_graft issue could be handled in a more elegant manner if 
> 	git was lib'ified (no temporary file). Since that is already the 
> 	plan, why not do that first, and come back later?

That is why it does not write any temporary files.  It
introduces a way to read graft information from an environment
variable.

> - It looks wrong to me to define MAX_PARENTS as 20 in upload-pack.c, when 
> 	MAXPARENT is defined as 16 in cache.h.

This is remnant from my earlier one that did not move MAXPARENT
out from commit-tree I forgot to clean up before calling it a
day.  Will remember to clean up.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Shallow clone: low level machinery.
  2006-01-31 17:49         ` Junio C Hamano
@ 2006-01-31 18:06           ` Johannes Schindelin
  2006-01-31 18:22             ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Johannes Schindelin @ 2006-01-31 18:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi,

On Tue, 31 Jan 2006, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > apart from my thinking this is not backward-compatible (you are supposed 
> > to be able to pull from a complete repo, even if it has a 
> > non-shallow-capable upload-pack), here are my comments:
> 
> It cannot do a shallow clone against older servers, no.

Worse, you cannot pull from older servers into shallow repos.

> > - The custom_graft issue could be handled in a more elegant manner if 
> > 	git was lib'ified (no temporary file). Since that is already the 
> > 	plan, why not do that first, and come back later?
> 
> That is why it does not write any temporary files.  It
> introduces a way to read graft information from an environment
> variable.

Ooops. I only saw that you setup_custom_grafts and assumed wrongly.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Shallow clone: low level machinery.
  2006-01-31 18:06           ` Johannes Schindelin
@ 2006-01-31 18:22             ` Junio C Hamano
  2006-02-01 14:33               ` Johannes Schindelin
  0 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2006-01-31 18:22 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Worse, you cannot pull from older servers into shallow repos.

"have X" means different thing if you do not have matching
grafts information, so I suspect that is fundamentally
unsolvable.

I am not sure you can convince "git-rev-list ^A" to mean "not at
A but things before that is still interesting", especially when
you give many other heads to start traversing from, but if you
can, then you can do things at rev-list command line parameter
level without doing the "exchange and use the same grafts"
trickery.  That _might_ be easier to implement but I do not see
an obvious correctness guarantee in the approach.

Implementation bugs aside, it is obvious the things _would_ work 
correctly with "exchange and use the same grafts" approach.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Shallow clone: low level machinery.
  2006-01-31 18:22             ` Junio C Hamano
@ 2006-02-01 14:33               ` Johannes Schindelin
  2006-02-01 20:27                 ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Johannes Schindelin @ 2006-02-01 14:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi,

On Tue, 31 Jan 2006, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > Worse, you cannot pull from older servers into shallow repos.
> 
> "have X" means different thing if you do not have matching
> grafts information, so I suspect that is fundamentally
> unsolvable.

If the shallow-capable client could realize that the server is not 
shallow-capable *and* the local repo is shallow, and refuse to operate 
(unless called with "-f", in which case the result may or may not be a 
broken repo, which has to be fixed up manually by copying 
over ORIG_HEAD to HEAD).

Of course, the client has to know that the local repo is shallow, which it 
must not determine by looking at the grafts file.

> I am not sure you can convince "git-rev-list ^A" to mean "not at
> A but things before that is still interesting", especially when
> you give many other heads to start traversing from, but if you
> can, then you can do things at rev-list command line parameter
> level without doing the "exchange and use the same grafts"
> trickery.  That _might_ be easier to implement but I do not see
> an obvious correctness guarantee in the approach.

If you introduce a different "have X" -- like "have-no-parent X" -- and 
teach git-rev-list that "~A" means "traverse the tree of A, but not A's 
parents", you'd basically have everything you need, right?

> Implementation bugs aside, it is obvious the things _would_ work 
> correctly with "exchange and use the same grafts" approach.

Yes, I agree. But again, the local repo has to know which grafts were 
introduced by making the repo shallow.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Shallow clone: low level machinery.
  2006-02-01 14:33               ` Johannes Schindelin
@ 2006-02-01 20:27                 ` Junio C Hamano
  2006-02-02  0:48                   ` Johannes Schindelin
  0 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2006-02-01 20:27 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> > Worse, you cannot pull from older servers into shallow repos.
>> 
>> "have X" means different thing if you do not have matching
>> grafts information, so I suspect that is fundamentally
>> unsolvable.
>
> If the shallow-capable client could realize that the server is not 
> shallow-capable *and* the local repo is shallow, and refuse to operate 
> (unless called with "-f", in which case the result may or may not be a 
> broken repo, which has to be fixed up manually by copying 
> over ORIG_HEAD to HEAD).

"If ... refuse to operate" then?  If "Then that is OK" is what
you meant to say I agree (I meant to code the client code that
way but I started only with the initial clone).  I said
"fundamentally unsolvable" because I thought you wanted it to do
something sensible without refusing even in such a case.

> Of course, the client has to know that the local repo is shallow, which it 
> must not determine by looking at the grafts file.

Sorry, I fail to understand this requirement.  Why is it "it must not"?

> If you introduce a different "have X" -- like "have-no-parent X" -- and 
> teach git-rev-list that "~A" means "traverse the tree of A, but not A's 
> parents", you'd basically have everything you need, right?

If you have such a modified rev-list, yes.  I was having doubts
about keeping an obvious correctness guarantee when doing such
"rev-list ~A".

> Yes, I agree. But again, the local repo has to know which grafts were 
> introduced by making the repo shallow.

I am not sure I understand.  grafts are grafts are grafts.  If
the other side has grafts to connect otherwise unrelated commit
objects, I suspect the cloner needs to know about them, all of
them, in order to use the resulting clone.  Also the upstream
side would need to know the altered world view the cloner has to
adjust the commit ancestry graph, at least during the cloning
and fetching, and I do not think it should be limited only to
cauterizign entries created by earlier shallow clone operations.
Manually created cauterizing entries should also count (for that
matter, grafts to stitch unrelated lines together), No?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Shallow clone: low level machinery.
  2006-02-01 20:27                 ` Junio C Hamano
@ 2006-02-02  0:48                   ` Johannes Schindelin
  2006-02-02  1:17                     ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Johannes Schindelin @ 2006-02-02  0:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi,

On Wed, 1 Feb 2006, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> >> > Worse, you cannot pull from older servers into shallow repos.
> >> 
> >> "have X" means different thing if you do not have matching
> >> grafts information, so I suspect that is fundamentally
> >> unsolvable.
> >
> > If the shallow-capable client could realize that the server is not 
> > shallow-capable *and* the local repo is shallow, and refuse to operate 
> > (unless called with "-f", in which case the result may or may not be a 
> > broken repo, which has to be fixed up manually by copying 
> > over ORIG_HEAD to HEAD).
> 
> "If ... refuse to operate" then?

Just skip the "If". I'll start to enclose all emails I write in 
<tired>..</tired> blocks.

> > Of course, the client has to know that the local repo is shallow, which it 
> > must not determine by looking at the grafts file.
> 
> Sorry, I fail to understand this requirement.  Why is it "it must not"?

See below.

> > If you introduce a different "have X" -- like "have-no-parent X" -- and 
> > teach git-rev-list that "~A" means "traverse the tree of A, but not A's 
> > parents", you'd basically have everything you need, right?
> 
> If you have such a modified rev-list, yes.  I was having doubts
> about keeping an obvious correctness guarantee when doing such
> "rev-list ~A".

I think it would be trivial: just resolve ~A to the tree A points to:

-- snip --
[PATCH] rev-list: Support "~treeish"

Now, "git rev-list --objects ~some_rev" traverses just the tree of
some_rev.

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>

---

 rev-list.c |   23 +++++++++++++++++++++++
 1 files changed, 23 insertions(+), 0 deletions(-)

43267e65c9ad933ad1a49005c4b61c23adaec372
diff --git a/rev-list.c b/rev-list.c
index 8012762..a196110 100644
--- a/rev-list.c
+++ b/rev-list.c
@@ -720,6 +720,24 @@ static void handle_one_commit(struct com
 	commit_list_insert(com, lst);
 }
 
+static void handle_tree(const unsigned char *sha1)
+{
+	struct object *object;
+
+	object = parse_object(sha1);
+	if (!object)
+		die("bad object %s", sha1_to_hex(sha1));
+
+	if (object->type == tree_type)
+		add_pending_object(object, "");
+	else if (object->type == commit_type) {
+		struct commit *commit = (struct commit *)object;
+		if (parse_commit(commit) < 0)
+			die("unable to parse commit %s", sha1_to_hex(sha1));
+		add_pending_object(&(commit->tree->object), "");
+	}
+}
+
 /* for_each_ref() callback does not allow user data -- Yuck. */
 static struct commit_list **global_lst;
 
@@ -865,6 +883,11 @@ int main(int argc, const char **argv)
 			flags = UNINTERESTING;
 			arg++;
 			limited = 1;
+		} else if (*arg == '~') {
+			if (get_sha1(arg + 1, sha1) < 0)
+				die("cannot get '%s'", arg);
+			handle_tree(sha1);
+			continue;
 		}
 		if (get_sha1(arg, sha1) < 0) {
 			struct stat st;
-- 
1.1.4.g9bd9d-dirty
-- snap --

> > Yes, I agree. But again, the local repo has to know which grafts were 
> > introduced by making the repo shallow.
> 
> I am not sure I understand.  grafts are grafts are grafts.

Exactly. And grafts are grafts are not necessarily cutoffs.

Now, is it possible that a fetch does something unintended, when there are 
grafts which are not cutoffs? I don't know yet, but I think so.

Ciao,
Dscho

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] Shallow clone: low level machinery.
  2006-02-02  0:48                   ` Johannes Schindelin
@ 2006-02-02  1:17                     ` Junio C Hamano
  2006-02-02 18:44                       ` Johannes Schindelin
  0 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2006-02-02  1:17 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> If you have such a modified rev-list, yes.  I was having doubts
>> about keeping an obvious correctness guarantee when doing such
>> "rev-list ~A".
>
> I think it would be trivial: just resolve ~A to the tree A points to:

<tired> Hmph.  I thought you meant "have-only A" to mean similar
to "have A" but additionally "do not assume I have things behind
A", and are going to extend rev-list to support ~A syntax to do
that.  I am a bit surprised to see your "rev-list ~A" is to
include A, not exclude A and not what are behind A.  Where is
the connection between this and "have-only A"?  </tired> ;-)

>> > Yes, I agree. But again, the local repo has to know which grafts were 
>> > introduced by making the repo shallow.
>> 
>> I am not sure I understand.  grafts are grafts are grafts.
>
> Exactly. And grafts are grafts are not necessarily cutoffs.
>
> Now, is it possible that a fetch does something unintended, when there are 
> grafts which are not cutoffs? I don't know yet, but I think so.

I think we are disagreeing, so "not Exactly".  I meant "grafts
are grafts, there is no cutoffs, they are also just grafts".  So
the answer to your question is "it does not matter".

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Shallow clone: low level machinery.
  2006-02-02  1:17                     ` Junio C Hamano
@ 2006-02-02 18:44                       ` Johannes Schindelin
  2006-02-02 19:31                         ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Johannes Schindelin @ 2006-02-02 18:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi,

On Wed, 1 Feb 2006, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > I think it would be trivial: just resolve ~A to the tree A points to:
> 
> <tired> Hmph.  I thought you meant "have-only A" to mean similar
> to "have A" but additionally "do not assume I have things behind
> A", and are going to extend rev-list to support ~A syntax to do
> that.  I am a bit surprised to see your "rev-list ~A" is to
> include A, not exclude A and not what are behind A.  Where is
> the connection between this and "have-only A"?  </tired> ;-)

<tired> My patch was wrong. You'd have to introduce a new flag saying: 
Traverse this commit, but mark its parents as uninteresting. </tired>

> > Now, is it possible that a fetch does something unintended, when there are 
> > grafts which are not cutoffs? I don't know yet, but I think so.
> 
> I think we are disagreeing, so "not Exactly".  I meant "grafts
> are grafts, there is no cutoffs, they are also just grafts".  So
> the answer to your question is "it does not matter".

Scenario: I have cvsimported a project. Using a graft, I told git that a 
certain commit is indeed a merge between two branches. That is, in 
addition to the parent the commit objects tells us about, it has another 
parent which was tip of another branch.

How would this graft be interpreted by the server we want to pull from? As 
if we had cut off the history. Which we did not. In effect, we could be 
sent many, many objects we already have.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Shallow clone: low level machinery.
  2006-02-02 18:44                       ` Johannes Schindelin
@ 2006-02-02 19:31                         ` Junio C Hamano
  0 siblings, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2006-02-02 19:31 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Scenario: I have cvsimported a project. Using a graft, I told git that a 
> certain commit is indeed a merge between two branches. That is, in 
> addition to the parent the commit objects tells us about, it has another 
> parent which was tip of another branch.
>
> How would this graft be interpreted by the server we want to pull from? As 
> if we had cut off the history. Which we did not. In effect, we could be 
> sent many, many objects we already have.

I thought the protocol is sending the full graft file both ways.
The uploader says "here are the grafts I have and use", and the
downloader modifies it and sends back what grafts it wants to
be used during the common revision discovery (aka building
rev-list parameters).  The most important modification during
this exchange is to cauterize the history at --since=v2.6.14
commit (or tag).

The uploader may not have the fake parent you grafted onto a
commit.  You may have a graft entry that says commit W has X, Y
and Z as its parents, when its real parent is only X.  Y may be
some other commit in the project (i.e. the other end knows about
it but it is not a real parent of W), and Z may be from a
development track that the uploader has not even heard of.  You
may say a commit V does not have parent but that commit itself
is from a separate development track the uploader does not know
about.

The uploader, however, should be able to at least honour, modulo
implementation bugs ;-), "X and Y are both parents of W" part.
Just ignoring V and Z and keeping usable part of information
would be a reasonable fallback position [*1*].  And that should
not result in a "many objects" situation when the downloader
says "Now I happen to have W, do not send things reachable from
it".  The uploader side should be able to omit what are
reachable from X or Y even though it cannot exclude things
reachable from Z.  Because the uploader does not even have Z,
there is no reason to worry about things reachable from Z being
sent unnecessarily to the downloader.

At least that was the intention.  "graft" messages are not about
sending "here are the cut-off points"; it is to agree on the
graft information both ends use during the common revision
computation.  The experimental code does not treat cut-offs any
differently other grafts.

[Footnote]

*1* we might want to enhance the "shallow" protocol further to
do this exchange slightly differently.  The downloader first
sends its grafts (which may contain parents or graft/cutoff
points that uploader does not have), and the uploader adjusts
the received grafts for commits like V and parents like Z and
then add its own grafts.  The result is sent back to the
downloader and that becomes the common set of grafts in effect
during the common revision discovery.  This would contain
commits and parents that the downloader does not yet have but
that is not a problem for common revision discovery.  After the
transfer is done, the downloader would adjust its "graft" file
if it made a new shallow clone, but otherwise it should not use
the information it received from the uploader, because things
like V and Z are not in this list.  I _think_ it would suffice
to look at each graft entry and to add that entry locally if it
talks about a commit the downloader does not have in its graft
file.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] shallow clone
  2006-01-30 18:46   ` Junio C Hamano
  2006-01-31 11:02     ` [PATCH] Shallow clone: low level machinery Junio C Hamano
@ 2006-01-31 14:20     ` Johannes Schindelin
  2006-01-31 20:59     ` Junio C Hamano
  2 siblings, 0 replies; 30+ messages in thread
From: Johannes Schindelin @ 2006-01-31 14:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi,

On Mon, 30 Jan 2006, Junio C Hamano wrote:

> We need to realize that `upload-pack` that hears
> "have A, want B" is allowed to omit objects that appear in
> `ls-tree B` output but not in `ls-tree A`.  "have A" means not
> just "I have A", but "I have A and all of its ancestors", so
> just sending "have start_shallow" (or start_shallow^ for that
> matter) is not quite enough.

So how about adding a "have-single A" which would be translated to 
"git-rev-list ~A", which in turn would only mark the tree and its 
children, but not the parents?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] shallow clone
  2006-01-30 18:46   ` Junio C Hamano
  2006-01-31 11:02     ` [PATCH] Shallow clone: low level machinery Junio C Hamano
  2006-01-31 14:20     ` [RFC] shallow clone Johannes Schindelin
@ 2006-01-31 20:59     ` Junio C Hamano
  2006-02-01 14:47       ` Johannes Schindelin
  2 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2006-01-31 20:59 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

This is whacky, but another completely different strategy is to
introduce remote alternates.

If you can allow objects/info/alternates to name a repository
that is not on the local disk, we can set the original remote
repository we "clone" from as one of the alternates, and teach
read_sha1_file() to locally cache objects we read from remote
alternates.

After such a "shallow clone", the user may want to prime the
cache by something like:

	$ git-rev-list --objects v2.6.14..master |
          git-pack-objects --stdout >/dev/null

before going offline.  Obviously you can keep the resulting pack
instead of leaving things loose.

I am not seriously advocating this yet -- adding calls to http
and git transfer machinery in read_sha1_file(), which is as low
level as you can go, is not something I have guts to do at the
moment.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] shallow clone
  2006-01-31 20:59     ` Junio C Hamano
@ 2006-02-01 14:47       ` Johannes Schindelin
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Schindelin @ 2006-02-01 14:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi,

On Tue, 31 Jan 2006, Junio C Hamano wrote:

> This is whacky, but another completely different strategy is to
> introduce remote alternates.

I'd rather go with the original plan. After all, you do not really need 
the cut-off commit objects. All needed objects are available on the server 
side: it just has to have a way to know which ones to send.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 30+ messages in thread

[parent not found: <43DF1F1D.1060704@innova-card.com>]

* Re: [RFC] shallow clone
       [not found] ` <43DF1F1D.1060704@innova-card.com>
@ 2006-01-31  9:00   ` Franck
  0 siblings, 0 replies; 30+ messages in thread
From: Franck @ 2006-01-31  9:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

2006/1/31, Franck Bui-Huu <fbh.work@gmail.com>:
> Junio C Hamano wrote:
> > Shallow History Cloning
> > =======================
> >
> > One good thing about git repository is that each clone is a
> > freestanding and complete entity, and you can keep developing in
> > it offline, without talking to the outside world, knowing that
> > you can sync with them later when online.
> >

could we be able to make a public repository from such repo ?

> > It is also a bad thing.  It gives people working on projects
> > with long development history stored in CVS a heart attack when
> > we tell them that their clones need to store the whole history.
> >

yeah and I haven't survive :)
I didn't notice that other people were asking for this feature, that's great !

> > There was a suggestion by Linus to allow a partial clone using a
> > syntax like this:

[snip]

> >
> > There are some issues.
> >
> > . In the fetch above to obtain everything after v2.6.14, and
> >   future runs of `git fetch origin`, if a blob that is in the
> >   commit being fetched happens to match what used to be in a
> >   commit that is older than v2.6.14 (e.g. a patch was reverted),
> >   `upload-pack` running on the other end is free to omit sending
> >   it, because we are telling it that we are up to date with
> >   respect to v2.6.14.  Although I think the current `rev-list
> >   --objects` implementation does not always do such a revert
> >   optimization if the revert is to a blob in a revision that is
> >   sufficiently old, it is free to optimize more aggressively in
> >   the future.
> >

oops, I wasn't aware of that. I still can resolve this issue by hand, no ?

> > . Later when the user decides to fetch older history, the
> >   operation can become a bit cumbersome.
> >

[snip]

> >
> > Design
> > ------
> >
> > First, to bootstrap the process, we would need to add a way to
> > obtain all objects associated with a commit.  We could do a new
> > program, or we could implement this as a protocol extension to
> > `upload-pack`.  My current inclination is the latter.

is the document in "Documentation/technical/pack-protocol.txt"
uptodate ? I can't find anything on multi_ack for example.

> >
> > When talking with `upload-pack` that supports this extension,
> > the downloader can give one commit object name and get a pack
> > that contains all the objects in the tree associated with that
> > commit, plus the commit object itself.  This is a rough
> > equivalent of running the commit walker with the `-t` flag.

[snip]

> >
> >
> > Anybody want to try?
> >

well, you made almost the job with your analysis, but I've never took
a look to git deep internals and with my lack of time, it would take
too much time...

Thanks
--
               Franck

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2006-02-02 19:31 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-30  7:18 [RFC] shallow clone Junio C Hamano
2006-01-30 11:39 ` Johannes Schindelin
2006-01-30 11:58   ` Simon Richter
2006-01-30 12:13     ` Johannes Schindelin
2006-01-30 13:25       ` Simon Richter
2006-01-30 19:25       ` Junio C Hamano
2006-01-31 11:28         ` Johannes Schindelin
2006-01-31 13:05           ` Simon Richter
2006-01-31 13:31             ` Johannes Schindelin
2006-01-31 14:23               ` Simon Richter
2006-01-30 19:25     ` Junio C Hamano
2006-01-31  8:37       ` Franck
2006-01-31  8:51         ` Junio C Hamano
2006-01-31 11:11           ` Franck
2006-01-30 18:46   ` Junio C Hamano
2006-01-31 11:02     ` [PATCH] Shallow clone: low level machinery Junio C Hamano
2006-01-31 13:58       ` Johannes Schindelin
2006-01-31 17:49         ` Junio C Hamano
2006-01-31 18:06           ` Johannes Schindelin
2006-01-31 18:22             ` Junio C Hamano
2006-02-01 14:33               ` Johannes Schindelin
2006-02-01 20:27                 ` Junio C Hamano
2006-02-02  0:48                   ` Johannes Schindelin
2006-02-02  1:17                     ` Junio C Hamano
2006-02-02 18:44                       ` Johannes Schindelin
2006-02-02 19:31                         ` Junio C Hamano
2006-01-31 14:20     ` [RFC] shallow clone Johannes Schindelin
2006-01-31 20:59     ` Junio C Hamano
2006-02-01 14:47       ` Johannes Schindelin
     [not found] ` <43DF1F1D.1060704@innova-card.com>
2006-01-31  9:00   ` Franck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).