* [RFC] shallow clone @ 2006-01-30 7:18 Junio C Hamano 2006-01-30 11:39 ` Johannes Schindelin [not found] ` <43DF1F1D.1060704@innova-card.com> 0 siblings, 2 replies; 30+ messages in thread From: Junio C Hamano @ 2006-01-30 7:18 UTC (permalink / raw) To: git Shallow History Cloning ======================= One good thing about git repository is that each clone is a freestanding and complete entity, and you can keep developing in it offline, without talking to the outside world, knowing that you can sync with them later when online. It is also a bad thing. It gives people working on projects with long development history stored in CVS a heart attack when we tell them that their clones need to store the whole history. There was a suggestion by Linus to allow a partial clone using a syntax like this: $ git clone --since=v2.6.14 git://.../linux-2.6/ master Here is an outline of what changes are needed to the current core to do this. Strategy -------- We have `info/grafts` mechanism to fake parent information for commit objects. Using this facility, we could roughly do: . Download the full tree for v2.6.14 commit and store its objects locally. . Set up `info/grafts` to lie to the local git that Linux kernel history began at v2.6.14 version. . Run `git fetch git://.../linux-2.6 master`, with a local ref pointing at v2.6.14 commit, to pretend that we have everything up to v2.6.14 to `upload-pack` running on the other end. . Update the `origin` branch with the master commit object name we just fetched from Linus. There are some issues. . In the fetch above to obtain everything after v2.6.14, and future runs of `git fetch origin`, if a blob that is in the commit being fetched happens to match what used to be in a commit that is older than v2.6.14 (e.g. a patch was reverted), `upload-pack` running on the other end is free to omit sending it, because we are telling it that we are up to date with respect to v2.6.14. Although I think the current `rev-list --objects` implementation does not always do such a revert optimization if the revert is to a blob in a revision that is sufficiently old, it is free to optimize more aggressively in the future. . Later when the user decides to fetch older history, the operation can become a bit cumbersome. I think the latter one is cumbersome but is doable -- we could do the equivalent of: $ git clone --since=v2.6.13 origin v2.6.14 place all the objects obtained by such a clone/fetch operation and remember that now we have history beginning at v2.6.13. So let's worry about that later. For the first issue, we need to have the other end cooperate while fetching from it. If the other end also thinks the development started at v2.6.14, even if we tell that we have the history up to v2.6.14 (or a commit we obtained since then), there is no way for `upload-pack` running there to optimize too agressively and assume we have a blob that appeared in v2.6.13. More simply, we do not have to tell them we have anything -- if the other end thinks the epoch is at v2.6.14, only commits that comes later will be sent to us. Design ------ First, to bootstrap the process, we would need to add a way to obtain all objects associated with a commit. We could do a new program, or we could implement this as a protocol extension to `upload-pack`. My current inclination is the latter. When talking with `upload-pack` that supports this extension, the downloader can give one commit object name and get a pack that contains all the objects in the tree associated with that commit, plus the commit object itself. This is a rough equivalent of running the commit walker with the `-t` flag. Another functionality we would need is to tell `upload-pack` to use `info/grafts` of downloader's choice. With this, after fetching the objects for v2.6.14 commit, the downloader can set up its own grafts file to cauterize the development history at v2.6.14, and tell the `upload-pack` to pretend the kernel history starts at that commit, while sending the tip of Linus' development track to us. Using the extended protocol (let's call it 'shallow' extension), a clone to create a repository that has only recent kernel history since v2.6.14 goes like this: The first client is to fetch the v2.6.14 itself. [NOTE] Most likely this is not directly run by the user but is run as the first command invoked by the shallow clone script. 1. The `fetch-pack` command acquires a new option, `--single`: $ git-fetch-pack --single git://.../linux-2.6/ v2.6.14 This talks with `upload-pack` on the kernel.org server via `git-daemon`. 2. `upload-pack` tells the fetcher what commits it has, what their refs are, and what protocol extensions it supports, as usual. 3. If it does not see `shallow` extension supported, there is no way to get a single tree, so things fail here. Otherwise, it sends `single X{40}\0` request, instead of the usual `want` line. The object name sent here is the desired commit. 4. `upload-pack` notices this is a single commit request, and sends an ACK if it can satisfy the request (or a NAK if it can't, e.g. it does not have the asked commit). Instead of doing the usual `get_common_commits` followed by `create_pack_file`, it does: $ git rev-list -n1 --objects $commit | git pack-object and sends the result out. 5. The fetcher checks the ACK and receives the objects. After the above exchange, we have downloaded v2.6.14 commit and its objects but not its history. `git-fetch-pack` would output the tag object name for `v2.6.14` and we would stash it away in `$GIT_DIR/FETCH_HEAD` as usual. Then we set up `info/grafts` with this: $ git rev-parse FETCH_HEAD^{commit} >"$GIT_DIR/info/grafts" This cauterizes the history on our end. The second phase of the shallow clone is to fetch the history since v2.6.14 to the tip. 1. The `fetch-pack` command is run as usual. Most likely the command line run by the shallow clone script would be: $ git fetch-pack git://.../linux-2.6/ master Notice there is nothing magical about it. It is just the business as usual. 2. `upload-pack` does its usual greeting to the downloader. 3. We notice `shallow` extension again, and first send out `graft X{40}\0` request. The syntax of graft request would be `graft ` followed by one or more commit object names on a line separated with SP. After sending out all the needed graft requests (in this example there is only one, to cauterize the history at v2.6.14), it does the usual `want X{40}\0multi_ack` and a flush. 4. `upload-pack` notices graft requests, reinitializes its graft information with what it receives from the other end, and then records `want`. 5. After the above steps, the usual `upload-pack` vs `fetch-pack` exchange continues and objects needed to complete the Linus' tip of development trail for somebody who has v2.6.14 are sent in a pack. The difference from the usual operation is that `upload-pack` during this run thinks v2.6.14 commit does not have any parent. The exact sequence from the second part of the initial "shallow clone" can be used for further updates. There is a small issue about the actual implementation. In the above description I pretended that `upload-pack` can be told to use phony grafts information, but in the current implementation the program that needs to use phony grafts information is `rev-list` spawned from it. We _could_ point GIT_GRAFT_FILE environment variable point at a temporary file while we do so, but I'd like to avoid using a temporary file if possible, given that `upload-pack` is run from `git-daemon`. Maybe we could give --read-graft-from-stdin flag to `rev-list` for this purpose. Anybody want to try? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] shallow clone 2006-01-30 7:18 [RFC] shallow clone Junio C Hamano @ 2006-01-30 11:39 ` Johannes Schindelin 2006-01-30 11:58 ` Simon Richter 2006-01-30 18:46 ` Junio C Hamano [not found] ` <43DF1F1D.1060704@innova-card.com> 1 sibling, 2 replies; 30+ messages in thread From: Johannes Schindelin @ 2006-01-30 11:39 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Hi, On Sun, 29 Jan 2006, Junio C Hamano wrote: > Strategy > -------- > > We have `info/grafts` mechanism to fake parent information for > commit objects. Using this facility, we could roughly do: > > . Download the full tree for v2.6.14 commit and store its > objects locally. On first read, I mistook "tree" for "commit"... > . Set up `info/grafts` to lie to the local git that Linux kernel > history began at v2.6.14 version. Maybe also record this in .git/config, so that you can - disallow fetching from this repo, and - easily extend the shallow copy to a larger shallow one, or a full one. > . Run `git fetch git://.../linux-2.6 master`, with a local ref > pointing at v2.6.14 commit, to pretend that we have everything > up to v2.6.14 to `upload-pack` running on the other end. How about refs/tags/start_shallow? > . Update the `origin` branch with the master commit object name > we just fetched from Linus. > > Design > ------ > > [...] > > Another functionality we would need is to tell `upload-pack` to > use `info/grafts` of downloader's choice. With this, after > fetching the objects for v2.6.14 commit, the downloader can set > up its own grafts file to cauterize the development history at > v2.6.14, and tell the `upload-pack` to pretend the kernel > history starts at that commit, while sending the tip of Linus' > development track to us. Why not just start another fetch? Then, "have <refs/tags/start_shallow>" would be sent, and upload-pack does the right thing? If you absolutely want to get only one pack, which then is stored as-is, upload-pack could start two rev-list processes: one for the tree and one for all the rest. > [...] > > [NOTE] > Most likely this is not directly run by the user but is run as > the first command invoked by the shallow clone script. Better make it an option to git-clone > 4. `upload-pack` notices this is a single commit request, and > sends an ACK if it can satisfy the request (or a NAK if it > can't, e.g. it does not have the asked commit). Instead of > doing the usual `get_common_commits` followed by > `create_pack_file`, it does: > > $ git rev-list -n1 --objects $commit | git pack-object Here it could say (git rev-list -n1 --objects $commit_since; git rev-list --objects ^$commit_since $commit) | git pack-object If the former is still needed (e.g. for git-tar-remote-tree), we could distinguish "single <ref>" and "shallow <ref>" commands. > [...] > > The second phase of the shallow clone is to fetch the history > since v2.6.14 to the tip. As I outlined above, I don't see the need for this. Ciao, Dscho ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] shallow clone 2006-01-30 11:39 ` Johannes Schindelin @ 2006-01-30 11:58 ` Simon Richter 2006-01-30 12:13 ` Johannes Schindelin 2006-01-30 19:25 ` Junio C Hamano 2006-01-30 18:46 ` Junio C Hamano 1 sibling, 2 replies; 30+ messages in thread From: Simon Richter @ 2006-01-30 11:58 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Junio C Hamano, git [-- Attachment #1: Type: text/plain, Size: 1523 bytes --] Hi, Johannes Schindelin wrote: >>. Set up `info/grafts` to lie to the local git that Linux kernel >> history began at v2.6.14 version. > Maybe also record this in .git/config, so that you can I like that "config" thing less and less every day. It appears to become a kind of registry, where having dedicated files for specific functionality would provide the robustness of tools not having to touch things they do not care about; but that's just personal opinion. > - disallow fetching from this repo, and Why? It's perfectly acceptable to pull from an incomplete repo, as long as you don't care about the old history. > - easily extend the shallow copy to a larger shallow one, or a full one. Hrm, I think there should also be a way to shrink a repo and "forget" old history occasionally (obviously, use of that feature would be highly discouraged). >>. Run `git fetch git://.../linux-2.6 master`, with a local ref >> pointing at v2.6.14 commit, to pretend that we have everything >> up to v2.6.14 to `upload-pack` running on the other end. > How about refs/tags/start_shallow? No, as that would imply that cloning from such a repo is disallowed. IMO, it may be a lot more robust to just have a list of "cutoff" object ids in .git/shallow instead of messing with grafts here, as adding or removing a line from that file is an easier thing to do for porcelain (or by hand) than rewriting the grafts file. Whether that list would be inclusive or exclusive would need to be decided still. Simon [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 307 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] shallow clone 2006-01-30 11:58 ` Simon Richter @ 2006-01-30 12:13 ` Johannes Schindelin 2006-01-30 13:25 ` Simon Richter 2006-01-30 19:25 ` Junio C Hamano 2006-01-30 19:25 ` Junio C Hamano 1 sibling, 2 replies; 30+ messages in thread From: Johannes Schindelin @ 2006-01-30 12:13 UTC (permalink / raw) To: Simon Richter; +Cc: Junio C Hamano, git Hi, On Mon, 30 Jan 2006, Simon Richter wrote: > Johannes Schindelin wrote: > > > > . Set up `info/grafts` to lie to the local git that Linux kernel > > > history began at v2.6.14 version. > > > Maybe also record this in .git/config, so that you can > > I like that "config" thing less and less every day. It appears to become a > kind of registry, where having dedicated files for specific functionality > would provide the robustness of tools not having to touch things they do not > care about; but that's just personal opinion. It is becoming sort of a registry: it contains metadata about the current repository, easily available to scripts and programs. I beg to differ on your personal opinion on the grounds that the robustness comes from testing, not from diversity. I much prefer to have a well tested config mechanism to having dozens of differently formatted files with less-than-well tested parsers. Thank you for the insights in your personal opinion anyway. > > - disallow fetching from this repo, and > > Why? It's perfectly acceptable to pull from an incomplete repo, as long as you > don't care about the old history. Right. But should that be the default? I don't think so. Therefore: disable it, and if the user is absolutely sure to do dumb things, she'll have to enable it explicitely. > > - easily extend the shallow copy to a larger shallow one, or a full one. > > Hrm, I think there should also be a way to shrink a repo and "forget" old > history occasionally (obviously, use of that feature would be highly > discouraged). Yes. And you need information about how shallow it used to be. My suggestion was to store that information at a place specific to that repository (see above). > > > . Run `git fetch git://.../linux-2.6 master`, with a local ref > > > pointing at v2.6.14 commit, to pretend that we have everything > > > up to v2.6.14 to `upload-pack` running on the other end. > > > How about refs/tags/start_shallow? > > No, as that would imply that cloning from such a repo is disallowed. See above. > IMO, it may be a lot more robust to just have a list of "cutoff" object ids in > .git/shallow instead of messing with grafts here, as adding or removing a line > from that file is an easier thing to do for porcelain (or by hand) than > rewriting the grafts file. Whether that list would be inclusive or exclusive > would need to be decided still. The functionality of cutoff objects is included in grafts functionality, so why should we spend time on reimplementing a subset of features? IMHO, adding and removing lines from scripts is fragile. I beg your pardon, you want to edit this information *by hand*? Wow. Ciao, Dscho ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] shallow clone 2006-01-30 12:13 ` Johannes Schindelin @ 2006-01-30 13:25 ` Simon Richter 2006-01-30 19:25 ` Junio C Hamano 1 sibling, 0 replies; 30+ messages in thread From: Simon Richter @ 2006-01-30 13:25 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Junio C Hamano, git [-- Attachment #1: Type: text/plain, Size: 3278 bytes --] Hi, Johannes Schindelin wrote: [config as a registry] > It is becoming sort of a registry: it contains metadata about the current > repository, easily available to scripts and programs. Provided you have a parser that can handle it. > I beg to differ on your personal opinion on the grounds that the > robustness comes from testing, not from diversity. I much prefer to have a > well tested config mechanism to having dozens of differently formatted > files with less-than-well tested parsers. Indeed. But we already have a method for associating data values with keys in a hierarchical namespace, and that one is pretty well tested. :-) >>Why? It's perfectly acceptable to pull from an incomplete repo, as long as you >>don't care about the old history. > Right. But should that be the default? I don't think so. Therefore: > disable it, and if the user is absolutely sure to do dumb things, she'll > have to enable it explicitely. What harm is done if I have an incomplete repository? It would probably make more sense to emit a warning on clone and explain things if the user tries to go to a version she doesn't have. >>Hrm, I think there should also be a way to shrink a repo and "forget" old >>history occasionally (obviously, use of that feature would be highly >>discouraged). > Yes. And you need information about how shallow it used to be. My > suggestion was to store that information at a place specific to that > repository (see above). Indeed, but you are keeping this information in two places, namely the grafts file and the config file. This is asking for trouble if they ever get out of sync. >>>How about refs/tags/start_shallow? >>No, as that would imply that cloning from such a repo is disallowed. > See above. Well, I can however see the use case of a developer hosting an incomplete repo on a free web service and another developer wanting to merge her changes into her (complete) repo. You would have to specialcase this tag in the fetch operation to avoid copying it over. What's probably worse: You can only have a single cutoff point that way. You probably want multiple in case you want to cut off at a place where development happened in multiple branches that got subsequently merged inside the window of objects you keep. > The functionality of cutoff objects is included in grafts functionality, > so why should we spend time on reimplementing a subset of features? I would ask for the grafts parser to add "fake" grafts when it encounters the "shallow" file. Otherwise, it would be hard to distinguish between grafts the user made when doing interesting merges, and grafts that were created to build a shallow repo, because you would need some heuristics to figure out the latter from the former if you want to have a function in your porcelain to "pull more/all objects". > I beg your pardon, you want to edit this information *by hand*? Wow. Yes. That is actually the reason I like git so much: I can repair it by hand if something breaks, and this can be done with simple commands. I can remove an object id from a file with "grep -v" or perl. I would need to fire up an editor or hack a longer script if I wanted to fix something inside a complex file that does multiple things. Simon [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 307 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] shallow clone 2006-01-30 12:13 ` Johannes Schindelin 2006-01-30 13:25 ` Simon Richter @ 2006-01-30 19:25 ` Junio C Hamano 2006-01-31 11:28 ` Johannes Schindelin 1 sibling, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2006-01-30 19:25 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> > - disallow fetching from this repo, and >> >> Why? It's perfectly acceptable to pull from an incomplete >> repo, as long as you don't care about the old history. > > Right. But should that be the default? I don't think so. Therefore: > disable it, and if the user is absolutely sure to do dumb things, she'll > have to enable it explicitely. If the downstream person wants to have a shallow history of post X.org X server core to further hack on it, I do not think of a reason why we would want to refuse her from cloning a repository of a fellow developer who has already done such a shallow copy. If such a clone is done without telling the downstream that the result is a shallow one, it is "dumb". I would agree it should not be done. We need to propagate the grafts to the downstream when a clone is done because of this. By the way, please refrain from discussing .git/config vs .git/eparate-config-files issue in this thread. My personal feeling so far is that the information current graft represents is good enough to support shallow clones, and if not we can extend its semantics to support such. It can be discussed independently if it is a good idea to move the final result (grafts with updated semantics) to config file. Even if we end up not doing any of the shallow cloning support we have been discussing, moving the information in .git/info/grafts to config might make sense. The issue is tangential. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] shallow clone 2006-01-30 19:25 ` Junio C Hamano @ 2006-01-31 11:28 ` Johannes Schindelin 2006-01-31 13:05 ` Simon Richter 0 siblings, 1 reply; 30+ messages in thread From: Johannes Schindelin @ 2006-01-31 11:28 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Hi, On Mon, 30 Jan 2006, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > >> > - disallow fetching from this repo, and > >> > >> Why? It's perfectly acceptable to pull from an incomplete > >> repo, as long as you don't care about the old history. > > > > Right. But should that be the default? I don't think so. Therefore: > > disable it, and if the user is absolutely sure to do dumb things, she'll > > have to enable it explicitely. > > If the downstream person wants to have a shallow history of post > X.org X server core to further hack on it, I do not think of a > reason why we would want to refuse her from cloning a repository > of a fellow developer who has already done such a shallow copy. Okay. But in their case, they'll probably do what was done with Linux: start afresh. If you want to have the old history, you can import it and merge it via a graft. > If such a clone is done without telling the downstream that the > result is a shallow one, it is "dumb". I would agree it should > not be done. That was my point. As long as you don't make sure the client handles the shallow upstream gracefully, it is dangerous. At the moment, there are too many code parts relying on the completeness of the repository (local and remote). Since I wrote this, I realized that the problem I saw is not limited to shallow upstream, but there is a subtle issue with shallow downstreams, too: Just imagine this: Alice starts a project, Bob makes a shallow copy from it when Alice just reverted an experimental feature. Then, Alice decides the experimental feature was not bad at all and reverts the revert. Bob pulls from Alice: Alice's upload-pack assumes Bob already has the original files (now re-reverted), and Bob ends up with a broken repository. While writing the last paragraph, it became clear to me that the shallow thing is very fragile: IMHO it is impossible to be fully backwards compatible (remember: you should not force anybody to upgrade). > By the way, please refrain from discussing .git/config vs > .git/eparate-config-files issue in this thread. Okay. I will shut up on that issue. > My personal feeling so far is that the information current graft > represents is good enough to support shallow clones, and if not we can > extend its semantics to support such. No. The grafts are more powerful. I have quite a few repos here in which I heavily work with grafts, and they are no cutoffs for shallow repos. They are hard links between different lines of development. For example, I use them to map merges in cvsimported projects, thus fixing a shortcoming of CVS. Also, you can "add" history. If you now rely on the grafts file to determine what was a cutoff, you may well end up with bogus cutoffs. Ciao, Dscho ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] shallow clone 2006-01-31 11:28 ` Johannes Schindelin @ 2006-01-31 13:05 ` Simon Richter 2006-01-31 13:31 ` Johannes Schindelin 0 siblings, 1 reply; 30+ messages in thread From: Simon Richter @ 2006-01-31 13:05 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Junio C Hamano, git [-- Attachment #1: Type: text/plain, Size: 2733 bytes --] Hi, Johannes Schindelin wrote: >>If the downstream person wants to have a shallow history of post >>X.org X server core to further hack on it, I do not think of a >>reason why we would want to refuse her from cloning a repository >>of a fellow developer who has already done such a shallow copy. > Okay. But in their case, they'll probably do what was done with Linux: > start afresh. If you want to have the old history, you can import it and > merge it via a graft. Well, in the Linux case the problem was not knowing what the SHA1 sum of the entire Linux history was. In the shallow repo case we know it, so there is no point in throwing away that information. >>If such a clone is done without telling the downstream that the >>result is a shallow one, it is "dumb". I would agree it should >>not be done. > That was my point. As long as you don't make sure the client handles the > shallow upstream gracefully, it is dangerous. At the moment, there are too > many code parts relying on the completeness of the repository (local and > remote). Well, the important thing would be that commands that can work (a merge only needs to find the most recent common ancestor, etc) do work, and commands that cannot ("log") emit sensible diagnostics. > Just imagine this: Alice starts a project, Bob makes a shallow copy from > it when Alice just reverted an experimental feature. Then, Alice decides > the experimental feature was not bad at all and reverts the revert. Bob > pulls from Alice: Alice's upload-pack assumes Bob already has the original > files (now re-reverted), and Bob ends up with a broken repository. I know far too little about the internal workings for that, but I'd assume that in this case Bob's copy starts at the commit that was never in question (and he never saw the reverted commit), and Alice's contains a commit on top of that. That one should work. But the other way 'round is problematic, when Bob starts with a commit that has been reverted in Alice's repository. The solution is for Bob to ask Alice's repo for the common ancestor of his shallow base and Alice's HEAD. Alice's repo can, however, fail to deliver these if there has been a purge since, in that case, stuff needs to be merged by hand (but you already have a problem if someone clones your repo before you revert changes, so no regression here). > If you now rely on the grafts file to determine what was a cutoff, you may > well end up with bogus cutoffs. Exactly that was my concern earlier; my database design gut feeling tells me that information duplication is not good either, hence my suggestion to split off these grafts into a separate file in order to mark them as cutoff points. Simon [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 307 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] shallow clone 2006-01-31 13:05 ` Simon Richter @ 2006-01-31 13:31 ` Johannes Schindelin 2006-01-31 14:23 ` Simon Richter 0 siblings, 1 reply; 30+ messages in thread From: Johannes Schindelin @ 2006-01-31 13:31 UTC (permalink / raw) To: Simon Richter; +Cc: Junio C Hamano, git Hi, On Tue, 31 Jan 2006, Simon Richter wrote: > Well, the important thing would be that commands that can work (a merge only > needs to find the most recent common ancestor, etc) do work, and commands that > cannot ("log") emit sensible diagnostics. No it would not. A commit is a very small object which points (among others) to a tree object. A tree object corresponds to a directory (that is, it can point to a number of tree and blob objects). A blob object corresponds to a file (that is, git never parses its contents). If two separate revisions contain the same file (i.e. same contents), this is not duplicated, but the corresponding tree objects point to the same object. If you pull, upload-pack will think you have *every* object depending on every ref you have stored. Say you have three revisions, A -> B -> C, and A and C contain the same file bla.txt, and the client says it has B, the upstream upload-pack assumes you have bla.txt. > I know far too little about the internal workings for that, [...] I hope I clarified the important aspect. > > If you now rely on the grafts file to determine what was a cutoff, you may > > well end up with bogus cutoffs. > > Exactly that was my concern earlier; my database design gut feeling tells me > that information duplication is not good either, [...] You only have two choices: you proposed code duplication, and yours truly proposed data duplication. As is known from good database design: a few redundancies here and there are typically needed for good performance. Ciao, Dscho ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] shallow clone 2006-01-31 13:31 ` Johannes Schindelin @ 2006-01-31 14:23 ` Simon Richter 0 siblings, 0 replies; 30+ messages in thread From: Simon Richter @ 2006-01-31 14:23 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Junio C Hamano, git [-- Attachment #1: Type: text/plain, Size: 607 bytes --] Hi, Johannes Schindelin wrote: > If you pull, upload-pack will think you have *every* object depending on > every ref you have stored. Ah, okay. That was the missing information, thanks. > You only have two choices: you proposed code duplication, and yours truly > proposed data duplication. Erm, if there are multiple places for parsing a grafts file, that needs to be addressed as well. > As is known from good database design: a few redundancies here and there > are typically needed for good performance. Sure, but only if you can "rebuild" all the redundant information reliably. Simon [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 307 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] shallow clone 2006-01-30 11:58 ` Simon Richter 2006-01-30 12:13 ` Johannes Schindelin @ 2006-01-30 19:25 ` Junio C Hamano 2006-01-31 8:37 ` Franck 1 sibling, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2006-01-30 19:25 UTC (permalink / raw) To: Simon Richter; +Cc: git Simon Richter <Simon.Richter@hogyros.de> writes: >> - disallow fetching from this repo, and > > Why? It's perfectly acceptable to pull from an incomplete repo, as > long as you don't care about the old history. I agree. As long as the cloned one can record itself as a shallow one (and with what epochs), I do not see a reason to forbid second generation clone from a shallow repository. > Hrm, I think there should also be a way to shrink a repo and "forget" > old history occasionally (obviously, use of that feature would be > highly discouraged). I do not think of a reason to discourage it, and I think you can do the "forgetting" part with the current set of tools. Choose appropriate cauterizing points, set up info/grafts and running "repack -a -d" would be sufficient. > IMO, it may be a lot more robust to just have a list of "cutoff" > object ids in .git/shallow instead of messing with grafts here, as > adding or removing a line from that file is an easier thing to do for > porcelain (or by hand) than rewriting the grafts file. Whether that > list would be inclusive or exclusive would need to be decided still. I would rather not to have .git/shallow nor .git/shallow_start. Cauterizing is not any more special than other grafts entries. If you have grafted historical kernel repository behind the official kernel repository with 2.6.12-rc2 epoch, I do not think of any reason to forbid people from cloning such with the grafts. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] shallow clone 2006-01-30 19:25 ` Junio C Hamano @ 2006-01-31 8:37 ` Franck 2006-01-31 8:51 ` Junio C Hamano 0 siblings, 1 reply; 30+ messages in thread From: Franck @ 2006-01-31 8:37 UTC (permalink / raw) To: Junio C Hamano; +Cc: Simon Richter, git 2006/1/30, Junio C Hamano <junkio@cox.net>: > Simon Richter <Simon.Richter@hogyros.de> writes: > > >> - disallow fetching from this repo, and > > > > Why? It's perfectly acceptable to pull from an incomplete repo, as > > long as you don't care about the old history. > > I agree. As long as the cloned one can record itself as a > shallow one (and with what epochs), I do not see a reason to > forbid second generation clone from a shallow repository. > I agree too > Cauterizing is not any more special than other grafts entries. > If you have grafted historical kernel repository behind the > official kernel repository with 2.6.12-rc2 epoch, I do not think > of any reason to forbid people from cloning such with the > grafts. > I built my public repository from a cautorized one and everybody who is pulling from mine is aware of the lack of the full history but they actually don't care. If someone is pulling from my repo, he actually wants to work on my project which do not need any old thing... Thanks -- Franck ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] shallow clone 2006-01-31 8:37 ` Franck @ 2006-01-31 8:51 ` Junio C Hamano 2006-01-31 11:11 ` Franck 0 siblings, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2006-01-31 8:51 UTC (permalink / raw) To: Franck; +Cc: git Franck <vagabon.xyz@gmail.com> writes: > I built my public repository from a cautorized one and everybody who > is pulling from mine is aware of the lack of the full history but they > actually don't care. If someone is pulling from my repo, he actually > wants to work on my project which do not need any old thing... Mind writing up a howto on the topic? - How things are set up using the current tool. - How others initially clone from you. - How others update (pull) from you. - What are the pitfalls you and others need to avoid (i.e. operations that involve old history) I brought this up, because lack of official support of shallow cloning was cited as one of the showstopper for a project that once considered switching to git but didn't, from a mailing list research. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] shallow clone 2006-01-31 8:51 ` Junio C Hamano @ 2006-01-31 11:11 ` Franck 0 siblings, 0 replies; 30+ messages in thread From: Franck @ 2006-01-31 11:11 UTC (permalink / raw) To: Junio C Hamano; +Cc: git 2006/1/31, Junio C Hamano <junkio@cox.net>: > Franck <vagabon.xyz@gmail.com> writes: > > > I built my public repository from a cautorized one and everybody who > > is pulling from mine is aware of the lack of the full history but they > > actually don't care. If someone is pulling from my repo, he actually > > wants to work on my project which do not need any old thing... > > Mind writing up a howto on the topic? ok I'll try to sum-up something this week, hope my bad english will be understandable... > > - How things are set up using the current tool. > - How others initially clone from you. > - How others update (pull) from you. > - What are the pitfalls you and others need to avoid > (i.e. operations that involve old history) actually I just discovered one thanks to your first email for this thread about reverted commit...So I'm not very the one for this section... > > I brought this up, because lack of official support of shallow > cloning was cited as one of the showstopper for a project that > once considered switching to git but didn't, from a mailing list > research. again I wasn't aware that this feature is really needed... thanks -- Franck ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] shallow clone 2006-01-30 11:39 ` Johannes Schindelin 2006-01-30 11:58 ` Simon Richter @ 2006-01-30 18:46 ` Junio C Hamano 2006-01-31 11:02 ` [PATCH] Shallow clone: low level machinery Junio C Hamano ` (2 more replies) 1 sibling, 3 replies; 30+ messages in thread From: Junio C Hamano @ 2006-01-30 18:46 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> . Download the full tree for v2.6.14 commit and store its >> objects locally. > > On first read, I mistook "tree" for "commit"... It turns out that this 'single' request step is unneeded, as long as we implement 'graft' requests. We can then tell "Cauterize at v2.6.14 and give me the master" to `upload-pack`. `upload-pack` would run `rev-list --objects master`, tries to include everything that is reachable from "master", but notices that the v2.6.14 commit does not have any parent (thanks to the customized graft) and stops there -- the result is the history since v2.6.14. >> . Set up `info/grafts` to lie to the local git that Linux kernel >> history began at v2.6.14 version. > > Maybe also record this in .git/config, so that you can > > - disallow fetching from this repo, and > - easily extend the shallow copy to a larger shallow one, or a full one. I thought about that before I wrote the message, but it boils down to grepping lines from grafts that have only one object name (i.e. cauterizing records), so it is redundant. Also there is no strict reason to forbid cloning from such a shallow repository. No harm is done as long as you make it clear to somebody who clones from you that what you have is a shallow copy, so that the cloned repository can cauterize history at appropriate places. A second generation clone, when cloning from a shallow repository, needs to mark itself that it has the same or shallower history (otherwise a third generation clone from it would not work), so the `upload-pack` protocol needs to be updated to send grafts information the `upload-pack` side usually uses to the downloader even when 'graft' request is not used by the downloader. But once it is done, you should be able to clone safely from a shallow repository and end up with a repository with the same (or shallower -- if you asked to make a shallow clone from it) history. > Why not just start another fetch? Then, "have <refs/tags/start_shallow>" > would be sent, and upload-pack does the right thing? Yes, almost. We need to realize that `upload-pack` that hears "have A, want B" is allowed to omit objects that appear in `ls-tree B` output but not in `ls-tree A`. "have A" means not just "I have A", but "I have A and all of its ancestors", so just sending "have start_shallow" (or start_shallow^ for that matter) is not quite enough [*1*]. > If you absolutely want to get only one pack, which then is stored as-is, > upload-pack could start two rev-list processes: one for the tree and one > for all the rest. The message you are responding did two separate transfers (one 'single', and another 'fetch'); I do not particularly mind doing two (it is just an initial clone anyway), but as I said it turns out that we do not need the initial 'single'. >> [NOTE] >> Most likely this is not directly run by the user but is run as >> the first command invoked by the shallow clone script. > > Better make it an option to git-clone Probably -- I was just outlining the lowest-level mechanism and haven't thought much about the UI. [Footnote] *1* This is true even without more aggressive optimization by rev-list that does not exist there yet. Here is a minimalistic demonstration. One file project with a handful straight-line commits. Each change to the file reverts the change made by the previous commit. * The HEAD commit has "white", the HEAD~1 "black" and HEAD~2 "white". * We say we are interested in things since HEAD~2 (i.e. we pretend that the history starts at HEAD~1 and it does not have a parent) and ask for HEAD. * Notice that only one copy of the file appears in the output. It is "black" blob. We do not get "white" blob because we are telling it that we _have_ HEAD~2. The resulting set of objects is not enough to check-out the HEAD commit. This roughly corresponds to your "have shallow_start", but not quite -- in that sequence you have objects for HEAD~2 commit. But the point is that I want to leave the door open for optimizing upload-pack, so that it can choose to omit objects that do not appear in A when you say "have A", if the object appears in one of A's ancestors. -- >8 -- #!/bin/sh rm -fr .git git init-db zebra=white echo $zebra >file git add file git commit -m initial for i in 0 1 2 3 4 5 do case $zebra in white) zebra=black ;; black) zebra=white ;; esac echo $zebra >file git commit -a -m "$i $zebra" done git rev-list --objects HEAD~2..HEAD | git name-rev --stdin ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH] Shallow clone: low level machinery. 2006-01-30 18:46 ` Junio C Hamano @ 2006-01-31 11:02 ` Junio C Hamano 2006-01-31 13:58 ` Johannes Schindelin 2006-01-31 14:20 ` [RFC] shallow clone Johannes Schindelin 2006-01-31 20:59 ` Junio C Hamano 2 siblings, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2006-01-31 11:02 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git This adds --shallow=refname option to git-clone-pack, and extends upload-pack protocol with "shallow" extension. An example: $ mkdir junk && cd junk && git init-db $ git clone-pack --shallow=refs/heads/master ../git.git master This creates a very shallow clone of my repository. It says "pretend refs/heads/master commit is the beginning of time, and clone your master branch". As before, clone-pack with explicit head name outputs the commit object name and refname to the standard output instead of creating the branch. The command creates a .git/info/grafts file to cauterize the history at that commit as well. I think upload-pack side is more or less ready to be debugged, but the client side is highly experimental. It has quite serious limitations and is more of a proof of correctness at the protocol extension level than for practical use: - Currently it can take only one ---shallow option. - It has to be spelled in full (refs/heads/master, not "master"). - It has to be included as part of explicit refname list. - There is no matching --shallow in git-fetch-pack. Signed-off-by: Junio C Hamano <junkio@cox.net> --- cache.h | 9 +++ clone-pack.c | 69 ++++++++++++++++++++++- commit-tree.c | 5 -- commit.c | 174 +++++++++++++++++++++++++++++++++++++++------------------ commit.h | 14 +++++ connect.c | 24 ++++++++ object.c | 7 ++ object.h | 2 + upload-pack.c | 94 +++++++++++++++++++++++++++++-- 9 files changed, 331 insertions(+), 67 deletions(-) 75f1f4871277f403991c771eb642bdbd6fe82021 diff --git a/cache.h b/cache.h index bdbe2d6..18d4cdb 100644 --- a/cache.h +++ b/cache.h @@ -111,11 +111,18 @@ static inline unsigned int create_ce_mod extern struct cache_entry **active_cache; extern unsigned int active_nr, active_alloc, active_cache_changed; +/* + * Having more than two parents is not strange at all, and this is + * how multi-way merges are represented. + */ +#define MAXPARENT (16) + #define GIT_DIR_ENVIRONMENT "GIT_DIR" #define DEFAULT_GIT_DIR_ENVIRONMENT ".git" #define DB_ENVIRONMENT "GIT_OBJECT_DIRECTORY" #define INDEX_ENVIRONMENT "GIT_INDEX_FILE" #define GRAFT_ENVIRONMENT "GIT_GRAFT_FILE" +#define GRAFT_INFO_ENVIRONMENT "GIT_GRAFT_INFO" extern char *get_git_dir(void); extern char *get_object_directory(void); @@ -296,6 +303,8 @@ struct ref { char name[FLEX_ARRAY]; /* more */ }; +extern void send_graft_info(int); + extern int git_connect(int fd[2], char *url, const char *prog); extern int finish_connect(pid_t pid); extern int path_match(const char *path, int nr, char **match); diff --git a/clone-pack.c b/clone-pack.c index f634431..c1708d5 100644 --- a/clone-pack.c +++ b/clone-pack.c @@ -1,15 +1,76 @@ #include "cache.h" #include "refs.h" #include "pkt-line.h" +#include "commit.h" static const char clone_pack_usage[] = -"git-clone-pack [--exec=<git-upload-pack>] [<host>:]<directory> [<heads>]*"; +"git-clone-pack [--shallow=name] [--exec=<git-upload-pack>] [<host>:]<directory> [<heads>]*"; static const char *exec = "git-upload-pack"; +static char *shallow = NULL; + +static void shallow_exchange(int fd[2], struct ref *ref) +{ + char line[1024]; + char *graft_file; + FILE *fp; + int i, j; + + while (ref) { + if (!strcmp(ref->name, shallow)) + break; + ref = ref->next; + } + if (!ref) + die("No matching ref specified for shallow clone %s", + shallow); + if (!server_supports("shallow")) + die("The other end does not support shallow clone"); + packet_write(fd[1], "shallow\n"); + packet_flush(fd[1]); + + /* Read their graft */ + prepare_commit_graft(); + for (;;) { + int len; + len = packet_read_line(fd[0], line, sizeof(line)); + if (!len) + break; + add_graft_info(line); + } + /* And cauterize at --shallow=<sha1> */ + sprintf(line, "%s\n", sha1_to_hex(ref->old_sha1)); + add_graft_info(line); + + /* tell ours */ + packet_write(fd[1], "custom\n"); + send_graft_info(fd[1]); + packet_flush(fd[1]); + + /* write out ours */ + graft_file = get_graft_file(); + fp = fopen(graft_file, "w"); + if (!fp) + die("cannot update grafts!"); + + for (i = 0; i < commit_graft_nr; i++) { + struct commit_graft *g = commit_graft[i]; + fputs(sha1_to_hex(g->sha1), fp); + for (j = 0; j < g->nr_parent; j++) { + fputc(' ', fp); + fputs(sha1_to_hex(g->parent[j]), fp); + } + fputc('\n', fp); + } + fclose(fp); +} static void clone_handshake(int fd[2], struct ref *ref) { unsigned char sha1[20]; + if (shallow) + shallow_exchange(fd, ref); + while (ref) { packet_write(fd[1], "want %s\n", sha1_to_hex(ref->old_sha1)); ref = ref->next; @@ -160,6 +221,10 @@ int main(int argc, char **argv) exec = arg + 7; continue; } + if (!strncmp("--shallow=", arg, 10)) { + shallow = arg + 10; + continue; + } usage(clone_pack_usage); } dest = arg; @@ -167,6 +232,8 @@ int main(int argc, char **argv) nr_heads = argc - i - 1; break; } + if (shallow && !nr_heads) + die("shallow clone needs an explicit head name"); if (!dest) usage(clone_pack_usage); pid = git_connect(fd, dest, exec); diff --git a/commit-tree.c b/commit-tree.c index 4634b50..cbf2979 100644 --- a/commit-tree.c +++ b/commit-tree.c @@ -53,11 +53,6 @@ static void check_valid(unsigned char *s free(buf); } -/* - * Having more than two parents is not strange at all, and this is - * how multi-way merges are represented. - */ -#define MAXPARENT (16) static unsigned char parent_sha1[MAXPARENT][20]; static const char commit_tree_usage[] = "git-commit-tree <sha1> [-p <sha1>]* < changelog"; diff --git a/commit.c b/commit.c index 97205bf..a862287 100644 --- a/commit.c +++ b/commit.c @@ -102,12 +102,8 @@ static unsigned long parse_commit_date(c return date; } -static struct commit_graft { - unsigned char sha1[20]; - int nr_parent; - unsigned char parent[0][20]; /* more */ -} **commit_graft; -static int commit_graft_alloc, commit_graft_nr; +struct commit_graft **commit_graft; +int commit_graft_alloc, commit_graft_nr; static int commit_graft_pos(const unsigned char *sha1) { @@ -128,62 +124,104 @@ static int commit_graft_pos(const unsign return -lo - 1; } -static void prepare_commit_graft(void) +int add_graft_info(char *buf) { - char *graft_file = get_graft_file(); - FILE *fp = fopen(graft_file, "r"); + /* The format is just "Commit Parent1 Parent2 ...\n" */ + int len = strlen(buf); + int i; + struct commit_graft *graft = NULL; + + if (buf[len-1] == '\n') + buf[--len] = 0; + if (buf[0] == '#') + return 0; + if ((len + 1) % 41) { + bad_graft_data: + error("bad graft data: %s", buf); + free(graft); + return -1; + } + i = (len + 1) / 41 - 1; + graft = xmalloc(sizeof(*graft) + 20 * i); + graft->nr_parent = i; + if (get_sha1_hex(buf, graft->sha1)) + goto bad_graft_data; + for (i = 40; i < len; i += 41) { + if (buf[i] != ' ') + goto bad_graft_data; + if (get_sha1_hex(buf + i + 1, graft->parent[i/41])) + goto bad_graft_data; + } + i = commit_graft_pos(graft->sha1); + if (0 <= i) { + free(commit_graft[i]); + commit_graft[i] = graft; + return 0; + } + i = -i - 1; + if (commit_graft_alloc <= ++commit_graft_nr) { + commit_graft_alloc = alloc_nr(commit_graft_alloc); + commit_graft = xrealloc(commit_graft, + sizeof(*commit_graft) * + commit_graft_alloc); + } + if (i < commit_graft_nr) + memmove(commit_graft + i + 1, + commit_graft + i, + (commit_graft_nr - i - 1) * + sizeof(*commit_graft)); + commit_graft[i] = graft; + return 0; +} + +void clear_commit_graft(void) +{ + int i; + for (i = 0; i < commit_graft_nr; i++) + free(commit_graft[i]); + free(commit_graft); + commit_graft_nr = commit_graft_alloc = 0; + commit_graft = NULL; +} + +void prepare_commit_graft(void) +{ + char *graft_file; + FILE *fp; char buf[1024]; + + if (getenv(GRAFT_INFO_ENVIRONMENT)) { + char *cp, *ep; + for (cp = getenv(GRAFT_INFO_ENVIRONMENT); + *cp; + cp = ep) { + int more = 0; + ep = strchr(cp, '\n'); + if (ep) { + more = 1; + *ep = '\0'; + } + else { + ep = cp + strlen(cp); + } + if (ep != cp) + add_graft_info(cp); + if (!more) + break; + *ep = '\n'; + ep++; + } + return; + } + graft_file = get_graft_file(); + fp = fopen(graft_file, "r"); if (!fp) { - commit_graft = (struct commit_graft **) "hack"; + commit_graft = (struct commit_graft **) xmalloc(1); return; } - while (fgets(buf, sizeof(buf), fp)) { - /* The format is just "Commit Parent1 Parent2 ...\n" */ - int len = strlen(buf); - int i; - struct commit_graft *graft = NULL; + while (fgets(buf, sizeof(buf), fp)) + add_graft_info(buf); - if (buf[len-1] == '\n') - buf[--len] = 0; - if (buf[0] == '#') - continue; - if ((len + 1) % 41) { - bad_graft_data: - error("bad graft data: %s", buf); - free(graft); - continue; - } - i = (len + 1) / 41 - 1; - graft = xmalloc(sizeof(*graft) + 20 * i); - graft->nr_parent = i; - if (get_sha1_hex(buf, graft->sha1)) - goto bad_graft_data; - for (i = 40; i < len; i += 41) { - if (buf[i] != ' ') - goto bad_graft_data; - if (get_sha1_hex(buf + i + 1, graft->parent[i/41])) - goto bad_graft_data; - } - i = commit_graft_pos(graft->sha1); - if (0 <= i) { - error("duplicate graft data: %s", buf); - free(graft); - continue; - } - i = -i - 1; - if (commit_graft_alloc <= ++commit_graft_nr) { - commit_graft_alloc = alloc_nr(commit_graft_alloc); - commit_graft = xrealloc(commit_graft, - sizeof(*commit_graft) * - commit_graft_alloc); - } - if (i < commit_graft_nr) - memmove(commit_graft + i + 1, - commit_graft + i, - (commit_graft_nr - i - 1) * - sizeof(*commit_graft)); - commit_graft[i] = graft; - } fclose(fp); } @@ -288,6 +326,30 @@ int parse_commit(struct commit *item) return ret; } +static void reparse_commit_parents(struct object *o) +{ + struct commit *c; + struct commit_list *parents; + if ((o->type != commit_type) || !o->parsed) + return; + c = (struct commit *)o; + parents = c->parents; + o->parsed = 0; + while (parents) { + struct commit_list *next = parents->next; + free(parents); + parents = next; + } + c->parents = NULL; + free(c->buffer); + c->buffer = NULL; +} + +void reparse_all_parsed_commits(void) +{ + for_each_object(reparse_commit_parents); +} + struct commit_list *commit_list_insert(struct commit *item, struct commit_list **list_p) { struct commit_list *new_list = xmalloc(sizeof(struct commit_list)); diff --git a/commit.h b/commit.h index 986b22d..abc5b9e 100644 --- a/commit.h +++ b/commit.h @@ -17,6 +17,20 @@ struct commit { char *buffer; }; +struct commit_graft { + unsigned char sha1[20]; + int nr_parent; + unsigned char parent[0][20]; /* more */ +}; + +extern struct commit_graft **commit_graft; +extern int commit_graft_alloc, commit_graft_nr; + +extern void prepare_commit_graft(void); +extern void clear_commit_graft(void); +extern int add_graft_info(char *); +extern void reparse_all_parsed_commits(void); + extern int save_commit_buffer; extern const char *commit_type; diff --git a/connect.c b/connect.c index 3f2d65c..046d1da 100644 --- a/connect.c +++ b/connect.c @@ -3,6 +3,7 @@ #include "pkt-line.h" #include "quote.h" #include "refs.h" +#include "commit.h" #include <sys/wait.h> #include <sys/socket.h> #include <netinet/in.h> @@ -298,6 +299,29 @@ int match_refs(struct ref *src, struct r return 0; } +void send_graft_info(int outfd) +{ + int i, j; + char packet_buf[41*MAXPARENT], *buf; + + for (i = 0; i < commit_graft_nr; i++) { + struct commit_graft *g = commit_graft[i]; + buf = packet_buf; + memcpy(buf, sha1_to_hex(g->sha1), 40); + buf += 40; + if (MAXPARENT <= g->nr_parent) + die("insanely big octopus graft with %d parents: %s", + g->nr_parent, sha1_to_hex(g->sha1)); + for (j = 0; j < g->nr_parent; j++) { + *buf++ = ' '; + memcpy(buf, sha1_to_hex(g->parent[j]), 40); + buf += 40; + } + *buf = 0; + packet_write(outfd, "%s\n", packet_buf); + } +} + enum protocol { PROTO_LOCAL = 1, PROTO_SSH, diff --git a/object.c b/object.c index 1577f74..bbcfcd8 100644 --- a/object.c +++ b/object.c @@ -252,3 +252,10 @@ int object_list_contains(struct object_l } return 0; } + +void for_each_object(void (*fn)(struct object *)) +{ + int i; + for (i = 0; i < nr_objs; i++) + fn(objs[i]); +} diff --git a/object.h b/object.h index 0e76182..b4c9729 100644 --- a/object.h +++ b/object.h @@ -55,4 +55,6 @@ unsigned object_list_length(struct objec int object_list_contains(struct object_list *list, struct object *obj); +void for_each_object(void (*)(struct object *)); + #endif /* OBJECT_H */ diff --git a/upload-pack.c b/upload-pack.c index d198055..90ea549 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -13,11 +13,16 @@ static const char upload_pack_usage[] = #define WANTED (1U << 2) #define MAX_HAS 256 #define MAX_NEEDS 256 -static int nr_has = 0, nr_needs = 0, multi_ack = 0, nr_our_refs = 0; +#define MAX_PARENTS 20 +static int nr_has = 0, nr_needs = 0, nr_our_refs = 0; static unsigned char has_sha1[MAX_HAS][20]; static unsigned char needs_sha1[MAX_NEEDS][20]; static unsigned int timeout = 0; +/* protocol extensions */ +static int multi_ack = 0; +static int using_custom_graft = 0; + static void reset_timeout(void) { alarm(timeout); @@ -163,6 +168,77 @@ static int get_common_commits(void) } } +static void exchange_grafts(void) +{ + int len; + char line[41*MAX_PARENTS]; + + /* We heard "shallow"; drop up to the next flush */ + for (;;) { + len = packet_read_line(0, line, sizeof(line)); + reset_timeout(); + if (!len) + break; + } + + /* Send our graft */ + prepare_commit_graft(); + send_graft_info(1); + packet_flush(1); + + /* For precise common commits discovery, we need to use + * the graft information we received from them. + * But this is expensive, so the downloader first says + * if it wants to use our graft as is. + */ + len = packet_read_line(0, line, sizeof(line)); + reset_timeout(); + if (!len) + ; /* use ours as is */ + else if (!strcmp(line, "custom\n")) { + using_custom_graft = 1; + clear_commit_graft(); + for (;;) { + len = packet_read_line(0, line, sizeof(line)); + reset_timeout(); + if (!len) + break; + if (add_graft_info(line)) + die("Bad graft line %s", line); + } + /* And using that, we prepare our end. */ + reparse_all_parsed_commits(); + } + else + die("expected 'custom', got '%s'", line); +} + +static void setup_custom_graft(void) +{ + char *graft_env = strdup(GRAFT_INFO_ENVIRONMENT "="); + int envlen = strlen(graft_env); + int i, j; + + for (i = 0; i < commit_graft_nr; i++) { + struct commit_graft *g = commit_graft[i]; + char buf[41*MAX_PARENTS], *ptr; + ptr = buf; + memcpy(ptr, sha1_to_hex(g->sha1), 40); + ptr += 40; + for (j = 0; j < g->nr_parent; j++) { + *ptr++ = ' '; + memcpy(ptr, sha1_to_hex(g->parent[j]), 40); + ptr += 40; + } + *ptr++ = '\n'; + *ptr = 0; + graft_env = xrealloc(graft_env, envlen + (ptr - buf)); + memcpy(graft_env + envlen, buf, ptr - buf + 1); + envlen += ptr - buf; + } + putenv(graft_env); +} + static int receive_needs(void) { static char line[1000]; @@ -180,16 +256,22 @@ static int receive_needs(void) sha1_buf = dummy; if (needs == MAX_NEEDS) { fprintf(stderr, - "warning: supporting only a max of %d requests. " + "warning: supporting only a max of " + "%d requests. " "sending everything instead.\n", MAX_NEEDS); } else if (needs < MAX_NEEDS) sha1_buf = needs_sha1[needs]; - if (strncmp("want ", line, 5) || get_sha1_hex(line+5, sha1_buf)) + if (!strcmp("shallow\n", line)) { + exchange_grafts(); + continue; + } + if (strncmp("want ", line, 5) || + get_sha1_hex(line+5, sha1_buf)) die("git-upload-pack: protocol error, " - "expected to get sha, not '%s'", line); + "expected to get want-sha1, not '%s'", line); if (strstr(line+45, "multi_ack")) multi_ack = 1; @@ -213,7 +295,7 @@ static int receive_needs(void) static int send_ref(const char *refname, const unsigned char *sha1) { - static char *capabilities = "multi_ack"; + static char *capabilities = "multi_ack shallow"; struct object *o = parse_object(sha1); if (capabilities) @@ -243,6 +325,8 @@ static int upload_pack(void) if (!nr_needs) return 0; get_common_commits(); + if (using_custom_graft) + setup_custom_graft(); create_pack_file(); return 0; } -- 1.1.6.gefef ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH] Shallow clone: low level machinery. 2006-01-31 11:02 ` [PATCH] Shallow clone: low level machinery Junio C Hamano @ 2006-01-31 13:58 ` Johannes Schindelin 2006-01-31 17:49 ` Junio C Hamano 0 siblings, 1 reply; 30+ messages in thread From: Johannes Schindelin @ 2006-01-31 13:58 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Hi, apart from my thinking this is not backward-compatible (you are supposed to be able to pull from a complete repo, even if it has a non-shallow-capable upload-pack), here are my comments: - it is good that MAXPARENT and struct commit_graft are in more public places now. - reparse_* is misleading. Nothing is reparsed, but rather "unparsed". - I'd hesitate to let git-daemon write temporary files. That is a whole new can of security worms. - It looks wrong to me to define MAX_PARENTS as 20 in upload-pack.c, when MAXPARENT is defined as 16 in cache.h. - The custom_graft issue could be handled in a more elegant manner if git was lib'ified (no temporary file). Since that is already the plan, why not do that first, and come back later? Ciao, Dscho ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Shallow clone: low level machinery. 2006-01-31 13:58 ` Johannes Schindelin @ 2006-01-31 17:49 ` Junio C Hamano 2006-01-31 18:06 ` Johannes Schindelin 0 siblings, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2006-01-31 17:49 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > apart from my thinking this is not backward-compatible (you are supposed > to be able to pull from a complete repo, even if it has a > non-shallow-capable upload-pack), here are my comments: It cannot do a shallow clone against older servers, no. I think it should be able to do a full clone from older servers, but I need to double check -- at least that is how I meant to write that thing but it was late night ;-). > - it is good that MAXPARENT and struct commit_graft are in more public > places now. > > - reparse_* is misleading. Nothing is reparsed, but rather "unparsed". I meant to reparse them thear but forgot. Will remember to fix. > - I'd hesitate to let git-daemon write temporary files. That is a whole > new can of security worms. > > - The custom_graft issue could be handled in a more elegant manner if > git was lib'ified (no temporary file). Since that is already the > plan, why not do that first, and come back later? That is why it does not write any temporary files. It introduces a way to read graft information from an environment variable. > - It looks wrong to me to define MAX_PARENTS as 20 in upload-pack.c, when > MAXPARENT is defined as 16 in cache.h. This is remnant from my earlier one that did not move MAXPARENT out from commit-tree I forgot to clean up before calling it a day. Will remember to clean up. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Shallow clone: low level machinery. 2006-01-31 17:49 ` Junio C Hamano @ 2006-01-31 18:06 ` Johannes Schindelin 2006-01-31 18:22 ` Junio C Hamano 0 siblings, 1 reply; 30+ messages in thread From: Johannes Schindelin @ 2006-01-31 18:06 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Hi, On Tue, 31 Jan 2006, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > apart from my thinking this is not backward-compatible (you are supposed > > to be able to pull from a complete repo, even if it has a > > non-shallow-capable upload-pack), here are my comments: > > It cannot do a shallow clone against older servers, no. Worse, you cannot pull from older servers into shallow repos. > > - The custom_graft issue could be handled in a more elegant manner if > > git was lib'ified (no temporary file). Since that is already the > > plan, why not do that first, and come back later? > > That is why it does not write any temporary files. It > introduces a way to read graft information from an environment > variable. Ooops. I only saw that you setup_custom_grafts and assumed wrongly. Ciao, Dscho ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Shallow clone: low level machinery. 2006-01-31 18:06 ` Johannes Schindelin @ 2006-01-31 18:22 ` Junio C Hamano 2006-02-01 14:33 ` Johannes Schindelin 0 siblings, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2006-01-31 18:22 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Worse, you cannot pull from older servers into shallow repos. "have X" means different thing if you do not have matching grafts information, so I suspect that is fundamentally unsolvable. I am not sure you can convince "git-rev-list ^A" to mean "not at A but things before that is still interesting", especially when you give many other heads to start traversing from, but if you can, then you can do things at rev-list command line parameter level without doing the "exchange and use the same grafts" trickery. That _might_ be easier to implement but I do not see an obvious correctness guarantee in the approach. Implementation bugs aside, it is obvious the things _would_ work correctly with "exchange and use the same grafts" approach. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Shallow clone: low level machinery. 2006-01-31 18:22 ` Junio C Hamano @ 2006-02-01 14:33 ` Johannes Schindelin 2006-02-01 20:27 ` Junio C Hamano 0 siblings, 1 reply; 30+ messages in thread From: Johannes Schindelin @ 2006-02-01 14:33 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Hi, On Tue, 31 Jan 2006, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > Worse, you cannot pull from older servers into shallow repos. > > "have X" means different thing if you do not have matching > grafts information, so I suspect that is fundamentally > unsolvable. If the shallow-capable client could realize that the server is not shallow-capable *and* the local repo is shallow, and refuse to operate (unless called with "-f", in which case the result may or may not be a broken repo, which has to be fixed up manually by copying over ORIG_HEAD to HEAD). Of course, the client has to know that the local repo is shallow, which it must not determine by looking at the grafts file. > I am not sure you can convince "git-rev-list ^A" to mean "not at > A but things before that is still interesting", especially when > you give many other heads to start traversing from, but if you > can, then you can do things at rev-list command line parameter > level without doing the "exchange and use the same grafts" > trickery. That _might_ be easier to implement but I do not see > an obvious correctness guarantee in the approach. If you introduce a different "have X" -- like "have-no-parent X" -- and teach git-rev-list that "~A" means "traverse the tree of A, but not A's parents", you'd basically have everything you need, right? > Implementation bugs aside, it is obvious the things _would_ work > correctly with "exchange and use the same grafts" approach. Yes, I agree. But again, the local repo has to know which grafts were introduced by making the repo shallow. Ciao, Dscho ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Shallow clone: low level machinery. 2006-02-01 14:33 ` Johannes Schindelin @ 2006-02-01 20:27 ` Junio C Hamano 2006-02-02 0:48 ` Johannes Schindelin 0 siblings, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2006-02-01 20:27 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> > Worse, you cannot pull from older servers into shallow repos. >> >> "have X" means different thing if you do not have matching >> grafts information, so I suspect that is fundamentally >> unsolvable. > > If the shallow-capable client could realize that the server is not > shallow-capable *and* the local repo is shallow, and refuse to operate > (unless called with "-f", in which case the result may or may not be a > broken repo, which has to be fixed up manually by copying > over ORIG_HEAD to HEAD). "If ... refuse to operate" then? If "Then that is OK" is what you meant to say I agree (I meant to code the client code that way but I started only with the initial clone). I said "fundamentally unsolvable" because I thought you wanted it to do something sensible without refusing even in such a case. > Of course, the client has to know that the local repo is shallow, which it > must not determine by looking at the grafts file. Sorry, I fail to understand this requirement. Why is it "it must not"? > If you introduce a different "have X" -- like "have-no-parent X" -- and > teach git-rev-list that "~A" means "traverse the tree of A, but not A's > parents", you'd basically have everything you need, right? If you have such a modified rev-list, yes. I was having doubts about keeping an obvious correctness guarantee when doing such "rev-list ~A". > Yes, I agree. But again, the local repo has to know which grafts were > introduced by making the repo shallow. I am not sure I understand. grafts are grafts are grafts. If the other side has grafts to connect otherwise unrelated commit objects, I suspect the cloner needs to know about them, all of them, in order to use the resulting clone. Also the upstream side would need to know the altered world view the cloner has to adjust the commit ancestry graph, at least during the cloning and fetching, and I do not think it should be limited only to cauterizign entries created by earlier shallow clone operations. Manually created cauterizing entries should also count (for that matter, grafts to stitch unrelated lines together), No? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Shallow clone: low level machinery. 2006-02-01 20:27 ` Junio C Hamano @ 2006-02-02 0:48 ` Johannes Schindelin 2006-02-02 1:17 ` Junio C Hamano 0 siblings, 1 reply; 30+ messages in thread From: Johannes Schindelin @ 2006-02-02 0:48 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Hi, On Wed, 1 Feb 2006, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > >> > Worse, you cannot pull from older servers into shallow repos. > >> > >> "have X" means different thing if you do not have matching > >> grafts information, so I suspect that is fundamentally > >> unsolvable. > > > > If the shallow-capable client could realize that the server is not > > shallow-capable *and* the local repo is shallow, and refuse to operate > > (unless called with "-f", in which case the result may or may not be a > > broken repo, which has to be fixed up manually by copying > > over ORIG_HEAD to HEAD). > > "If ... refuse to operate" then? Just skip the "If". I'll start to enclose all emails I write in <tired>..</tired> blocks. > > Of course, the client has to know that the local repo is shallow, which it > > must not determine by looking at the grafts file. > > Sorry, I fail to understand this requirement. Why is it "it must not"? See below. > > If you introduce a different "have X" -- like "have-no-parent X" -- and > > teach git-rev-list that "~A" means "traverse the tree of A, but not A's > > parents", you'd basically have everything you need, right? > > If you have such a modified rev-list, yes. I was having doubts > about keeping an obvious correctness guarantee when doing such > "rev-list ~A". I think it would be trivial: just resolve ~A to the tree A points to: -- snip -- [PATCH] rev-list: Support "~treeish" Now, "git rev-list --objects ~some_rev" traverses just the tree of some_rev. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> --- rev-list.c | 23 +++++++++++++++++++++++ 1 files changed, 23 insertions(+), 0 deletions(-) 43267e65c9ad933ad1a49005c4b61c23adaec372 diff --git a/rev-list.c b/rev-list.c index 8012762..a196110 100644 --- a/rev-list.c +++ b/rev-list.c @@ -720,6 +720,24 @@ static void handle_one_commit(struct com commit_list_insert(com, lst); } +static void handle_tree(const unsigned char *sha1) +{ + struct object *object; + + object = parse_object(sha1); + if (!object) + die("bad object %s", sha1_to_hex(sha1)); + + if (object->type == tree_type) + add_pending_object(object, ""); + else if (object->type == commit_type) { + struct commit *commit = (struct commit *)object; + if (parse_commit(commit) < 0) + die("unable to parse commit %s", sha1_to_hex(sha1)); + add_pending_object(&(commit->tree->object), ""); + } +} + /* for_each_ref() callback does not allow user data -- Yuck. */ static struct commit_list **global_lst; @@ -865,6 +883,11 @@ int main(int argc, const char **argv) flags = UNINTERESTING; arg++; limited = 1; + } else if (*arg == '~') { + if (get_sha1(arg + 1, sha1) < 0) + die("cannot get '%s'", arg); + handle_tree(sha1); + continue; } if (get_sha1(arg, sha1) < 0) { struct stat st; -- 1.1.4.g9bd9d-dirty -- snap -- > > Yes, I agree. But again, the local repo has to know which grafts were > > introduced by making the repo shallow. > > I am not sure I understand. grafts are grafts are grafts. Exactly. And grafts are grafts are not necessarily cutoffs. Now, is it possible that a fetch does something unintended, when there are grafts which are not cutoffs? I don't know yet, but I think so. Ciao, Dscho ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH] Shallow clone: low level machinery. 2006-02-02 0:48 ` Johannes Schindelin @ 2006-02-02 1:17 ` Junio C Hamano 2006-02-02 18:44 ` Johannes Schindelin 0 siblings, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2006-02-02 1:17 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> If you have such a modified rev-list, yes. I was having doubts >> about keeping an obvious correctness guarantee when doing such >> "rev-list ~A". > > I think it would be trivial: just resolve ~A to the tree A points to: <tired> Hmph. I thought you meant "have-only A" to mean similar to "have A" but additionally "do not assume I have things behind A", and are going to extend rev-list to support ~A syntax to do that. I am a bit surprised to see your "rev-list ~A" is to include A, not exclude A and not what are behind A. Where is the connection between this and "have-only A"? </tired> ;-) >> > Yes, I agree. But again, the local repo has to know which grafts were >> > introduced by making the repo shallow. >> >> I am not sure I understand. grafts are grafts are grafts. > > Exactly. And grafts are grafts are not necessarily cutoffs. > > Now, is it possible that a fetch does something unintended, when there are > grafts which are not cutoffs? I don't know yet, but I think so. I think we are disagreeing, so "not Exactly". I meant "grafts are grafts, there is no cutoffs, they are also just grafts". So the answer to your question is "it does not matter". ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Shallow clone: low level machinery. 2006-02-02 1:17 ` Junio C Hamano @ 2006-02-02 18:44 ` Johannes Schindelin 2006-02-02 19:31 ` Junio C Hamano 0 siblings, 1 reply; 30+ messages in thread From: Johannes Schindelin @ 2006-02-02 18:44 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Hi, On Wed, 1 Feb 2006, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > I think it would be trivial: just resolve ~A to the tree A points to: > > <tired> Hmph. I thought you meant "have-only A" to mean similar > to "have A" but additionally "do not assume I have things behind > A", and are going to extend rev-list to support ~A syntax to do > that. I am a bit surprised to see your "rev-list ~A" is to > include A, not exclude A and not what are behind A. Where is > the connection between this and "have-only A"? </tired> ;-) <tired> My patch was wrong. You'd have to introduce a new flag saying: Traverse this commit, but mark its parents as uninteresting. </tired> > > Now, is it possible that a fetch does something unintended, when there are > > grafts which are not cutoffs? I don't know yet, but I think so. > > I think we are disagreeing, so "not Exactly". I meant "grafts > are grafts, there is no cutoffs, they are also just grafts". So > the answer to your question is "it does not matter". Scenario: I have cvsimported a project. Using a graft, I told git that a certain commit is indeed a merge between two branches. That is, in addition to the parent the commit objects tells us about, it has another parent which was tip of another branch. How would this graft be interpreted by the server we want to pull from? As if we had cut off the history. Which we did not. In effect, we could be sent many, many objects we already have. Ciao, Dscho ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Shallow clone: low level machinery. 2006-02-02 18:44 ` Johannes Schindelin @ 2006-02-02 19:31 ` Junio C Hamano 0 siblings, 0 replies; 30+ messages in thread From: Junio C Hamano @ 2006-02-02 19:31 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Scenario: I have cvsimported a project. Using a graft, I told git that a > certain commit is indeed a merge between two branches. That is, in > addition to the parent the commit objects tells us about, it has another > parent which was tip of another branch. > > How would this graft be interpreted by the server we want to pull from? As > if we had cut off the history. Which we did not. In effect, we could be > sent many, many objects we already have. I thought the protocol is sending the full graft file both ways. The uploader says "here are the grafts I have and use", and the downloader modifies it and sends back what grafts it wants to be used during the common revision discovery (aka building rev-list parameters). The most important modification during this exchange is to cauterize the history at --since=v2.6.14 commit (or tag). The uploader may not have the fake parent you grafted onto a commit. You may have a graft entry that says commit W has X, Y and Z as its parents, when its real parent is only X. Y may be some other commit in the project (i.e. the other end knows about it but it is not a real parent of W), and Z may be from a development track that the uploader has not even heard of. You may say a commit V does not have parent but that commit itself is from a separate development track the uploader does not know about. The uploader, however, should be able to at least honour, modulo implementation bugs ;-), "X and Y are both parents of W" part. Just ignoring V and Z and keeping usable part of information would be a reasonable fallback position [*1*]. And that should not result in a "many objects" situation when the downloader says "Now I happen to have W, do not send things reachable from it". The uploader side should be able to omit what are reachable from X or Y even though it cannot exclude things reachable from Z. Because the uploader does not even have Z, there is no reason to worry about things reachable from Z being sent unnecessarily to the downloader. At least that was the intention. "graft" messages are not about sending "here are the cut-off points"; it is to agree on the graft information both ends use during the common revision computation. The experimental code does not treat cut-offs any differently other grafts. [Footnote] *1* we might want to enhance the "shallow" protocol further to do this exchange slightly differently. The downloader first sends its grafts (which may contain parents or graft/cutoff points that uploader does not have), and the uploader adjusts the received grafts for commits like V and parents like Z and then add its own grafts. The result is sent back to the downloader and that becomes the common set of grafts in effect during the common revision discovery. This would contain commits and parents that the downloader does not yet have but that is not a problem for common revision discovery. After the transfer is done, the downloader would adjust its "graft" file if it made a new shallow clone, but otherwise it should not use the information it received from the uploader, because things like V and Z are not in this list. I _think_ it would suffice to look at each graft entry and to add that entry locally if it talks about a commit the downloader does not have in its graft file. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] shallow clone 2006-01-30 18:46 ` Junio C Hamano 2006-01-31 11:02 ` [PATCH] Shallow clone: low level machinery Junio C Hamano @ 2006-01-31 14:20 ` Johannes Schindelin 2006-01-31 20:59 ` Junio C Hamano 2 siblings, 0 replies; 30+ messages in thread From: Johannes Schindelin @ 2006-01-31 14:20 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Hi, On Mon, 30 Jan 2006, Junio C Hamano wrote: > We need to realize that `upload-pack` that hears > "have A, want B" is allowed to omit objects that appear in > `ls-tree B` output but not in `ls-tree A`. "have A" means not > just "I have A", but "I have A and all of its ancestors", so > just sending "have start_shallow" (or start_shallow^ for that > matter) is not quite enough. So how about adding a "have-single A" which would be translated to "git-rev-list ~A", which in turn would only mark the tree and its children, but not the parents? Ciao, Dscho ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] shallow clone 2006-01-30 18:46 ` Junio C Hamano 2006-01-31 11:02 ` [PATCH] Shallow clone: low level machinery Junio C Hamano 2006-01-31 14:20 ` [RFC] shallow clone Johannes Schindelin @ 2006-01-31 20:59 ` Junio C Hamano 2006-02-01 14:47 ` Johannes Schindelin 2 siblings, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2006-01-31 20:59 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git This is whacky, but another completely different strategy is to introduce remote alternates. If you can allow objects/info/alternates to name a repository that is not on the local disk, we can set the original remote repository we "clone" from as one of the alternates, and teach read_sha1_file() to locally cache objects we read from remote alternates. After such a "shallow clone", the user may want to prime the cache by something like: $ git-rev-list --objects v2.6.14..master | git-pack-objects --stdout >/dev/null before going offline. Obviously you can keep the resulting pack instead of leaving things loose. I am not seriously advocating this yet -- adding calls to http and git transfer machinery in read_sha1_file(), which is as low level as you can go, is not something I have guts to do at the moment. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] shallow clone 2006-01-31 20:59 ` Junio C Hamano @ 2006-02-01 14:47 ` Johannes Schindelin 0 siblings, 0 replies; 30+ messages in thread From: Johannes Schindelin @ 2006-02-01 14:47 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Hi, On Tue, 31 Jan 2006, Junio C Hamano wrote: > This is whacky, but another completely different strategy is to > introduce remote alternates. I'd rather go with the original plan. After all, you do not really need the cut-off commit objects. All needed objects are available on the server side: it just has to have a way to know which ones to send. Ciao, Dscho ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <43DF1F1D.1060704@innova-card.com>]
* Re: [RFC] shallow clone [not found] ` <43DF1F1D.1060704@innova-card.com> @ 2006-01-31 9:00 ` Franck 0 siblings, 0 replies; 30+ messages in thread From: Franck @ 2006-01-31 9:00 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git Mailing List 2006/1/31, Franck Bui-Huu <fbh.work@gmail.com>: > Junio C Hamano wrote: > > Shallow History Cloning > > ======================= > > > > One good thing about git repository is that each clone is a > > freestanding and complete entity, and you can keep developing in > > it offline, without talking to the outside world, knowing that > > you can sync with them later when online. > > could we be able to make a public repository from such repo ? > > It is also a bad thing. It gives people working on projects > > with long development history stored in CVS a heart attack when > > we tell them that their clones need to store the whole history. > > yeah and I haven't survive :) I didn't notice that other people were asking for this feature, that's great ! > > There was a suggestion by Linus to allow a partial clone using a > > syntax like this: [snip] > > > > There are some issues. > > > > . In the fetch above to obtain everything after v2.6.14, and > > future runs of `git fetch origin`, if a blob that is in the > > commit being fetched happens to match what used to be in a > > commit that is older than v2.6.14 (e.g. a patch was reverted), > > `upload-pack` running on the other end is free to omit sending > > it, because we are telling it that we are up to date with > > respect to v2.6.14. Although I think the current `rev-list > > --objects` implementation does not always do such a revert > > optimization if the revert is to a blob in a revision that is > > sufficiently old, it is free to optimize more aggressively in > > the future. > > oops, I wasn't aware of that. I still can resolve this issue by hand, no ? > > . Later when the user decides to fetch older history, the > > operation can become a bit cumbersome. > > [snip] > > > > Design > > ------ > > > > First, to bootstrap the process, we would need to add a way to > > obtain all objects associated with a commit. We could do a new > > program, or we could implement this as a protocol extension to > > `upload-pack`. My current inclination is the latter. is the document in "Documentation/technical/pack-protocol.txt" uptodate ? I can't find anything on multi_ack for example. > > > > When talking with `upload-pack` that supports this extension, > > the downloader can give one commit object name and get a pack > > that contains all the objects in the tree associated with that > > commit, plus the commit object itself. This is a rough > > equivalent of running the commit walker with the `-t` flag. [snip] > > > > > > Anybody want to try? > > well, you made almost the job with your analysis, but I've never took a look to git deep internals and with my lack of time, it would take too much time... Thanks -- Franck ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2006-02-02 19:31 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-01-30 7:18 [RFC] shallow clone Junio C Hamano 2006-01-30 11:39 ` Johannes Schindelin 2006-01-30 11:58 ` Simon Richter 2006-01-30 12:13 ` Johannes Schindelin 2006-01-30 13:25 ` Simon Richter 2006-01-30 19:25 ` Junio C Hamano 2006-01-31 11:28 ` Johannes Schindelin 2006-01-31 13:05 ` Simon Richter 2006-01-31 13:31 ` Johannes Schindelin 2006-01-31 14:23 ` Simon Richter 2006-01-30 19:25 ` Junio C Hamano 2006-01-31 8:37 ` Franck 2006-01-31 8:51 ` Junio C Hamano 2006-01-31 11:11 ` Franck 2006-01-30 18:46 ` Junio C Hamano 2006-01-31 11:02 ` [PATCH] Shallow clone: low level machinery Junio C Hamano 2006-01-31 13:58 ` Johannes Schindelin 2006-01-31 17:49 ` Junio C Hamano 2006-01-31 18:06 ` Johannes Schindelin 2006-01-31 18:22 ` Junio C Hamano 2006-02-01 14:33 ` Johannes Schindelin 2006-02-01 20:27 ` Junio C Hamano 2006-02-02 0:48 ` Johannes Schindelin 2006-02-02 1:17 ` Junio C Hamano 2006-02-02 18:44 ` Johannes Schindelin 2006-02-02 19:31 ` Junio C Hamano 2006-01-31 14:20 ` [RFC] shallow clone Johannes Schindelin 2006-01-31 20:59 ` Junio C Hamano 2006-02-01 14:47 ` Johannes Schindelin [not found] ` <43DF1F1D.1060704@innova-card.com> 2006-01-31 9:00 ` Franck
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).