* [PATCH 0/2] Subtree clone? @ 2010-07-26 23:36 Nguyễn Thái Ngọc Duy 2010-07-26 23:36 ` [PATCH 1/2] upload-pack: support subtree packing Nguyễn Thái Ngọc Duy 2010-07-26 23:36 ` [PATCH 2/2] fetch-pack: support --subtree and --commit-subtree options Nguyễn Thái Ngọc Duy 0 siblings, 2 replies; 10+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2010-07-26 23:36 UTC (permalink / raw) To: git; +Cc: Nguyễn Thái Ngọc Duy This idea sounds quite nice to me. That is, instead of modifying git core to support narrow/partial clone, {upload,fetch}-pack is modified to give clients enough objects of so it can reconstruct a valid tree. Users are free to do whatever they want on that tree. When they want to push changes back, Git client creates proper commit/tree objects and push. The two patches in this series allow git to send objects of a subtree to client, or just a barebone subtree without blobs. The client can rewrite commits and throw the old commits. I don't want to add much computation to server side, subtree looks like a good fit (i.e. simply prefixcmp). Sparse checkout can then be used to shape worktree if subtree is not good enough. All the hard work is at client side, and git-subtree is a good candidate. Well, the idea is inspired by recent discussions of git-subtree vs git-submodule anyway. Lazy clone does something similar. However lazy clone requires connectivity to upstream. Lazy clone also exposes a security issue, allowing client to get any object it wants. Comments? Nguyễn Thái Ngọc Duy (2): upload-pack: support subtree packing fetch-pack: support --subtree and --commit-subtree options builtin/fetch-pack.c | 15 +++++++++++++++ upload-pack.c | 31 ++++++++++++++++++++++++++++++- 2 files changed, 45 insertions(+), 1 deletions(-) ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] upload-pack: support subtree packing 2010-07-26 23:36 [PATCH 0/2] Subtree clone? Nguyễn Thái Ngọc Duy @ 2010-07-26 23:36 ` Nguyễn Thái Ngọc Duy 2010-07-27 13:15 ` Ævar Arnfjörð Bjarmason 2010-07-27 14:46 ` Shawn O. Pearce 2010-07-26 23:36 ` [PATCH 2/2] fetch-pack: support --subtree and --commit-subtree options Nguyễn Thái Ngọc Duy 1 sibling, 2 replies; 10+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2010-07-26 23:36 UTC (permalink / raw) To: git; +Cc: Nguyễn Thái Ngọc Duy This patch adds a new capability "subtree", which supports two new requests "subtree" and "commit-subtree". "subtree" asks upload-pack to create a pack that contains only blobs from the given tree prefix (and necessary commits/trees to reach those blobs). "commit-tree" asks upload-pack to create a pack that contains trees of the given prefix (and necessary commits/trees to reach those trees) With "subtree" request, Git client may then rewrite commits to create a valid commit tree again, so that users can work on it independently. When users want to push from such a tree, "commit-tree" may then be used to re-match what users have and what is in upstream, recreate proper push commits. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- upload-pack.c | 31 ++++++++++++++++++++++++++++++- 1 files changed, 30 insertions(+), 1 deletions(-) diff --git a/upload-pack.c b/upload-pack.c index dc464d7..f97296a 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -41,6 +41,8 @@ static int use_sideband; static int debug_fd; static int advertise_refs; static int stateless_rpc; +static char *subtree; +static int commit_subtree; static void reset_timeout(void) { @@ -89,6 +91,17 @@ static void show_object(struct object *obj, const struct name_path *path, const */ const char *name = path_name(path, component); const char *ep = strchr(name, '\n'); + if (subtree) { + int len = strlen(name); + /* parent trees should always be kept */ + if (obj->type == OBJ_TREE && !prefixcmp(subtree, name) && subtree[len] == '/') + ; /* in */ + else if (commit_subtree) + goto out; + else if (prefixcmp(name, subtree)) + goto out; + } + if (ep) { fprintf(pack_pipe, "%s %.*s\n", sha1_to_hex(obj->sha1), (int) (ep - name), @@ -97,6 +110,7 @@ static void show_object(struct object *obj, const struct name_path *path, const else fprintf(pack_pipe, "%s %s\n", sha1_to_hex(obj->sha1), name); +out: free((char *)name); } @@ -504,6 +518,21 @@ static void receive_needs(void) if (debug_fd) write_in_full(debug_fd, line, len); + if (!prefixcmp(line, "subtree ")) { + int len = strlen(line+8); + subtree = malloc(len+1); + memcpy(subtree, line+8, len-1); + subtree[len-1] = '\0'; /* \n */ + continue; + } + if (!prefixcmp(line, "commit-subtree ")) { + int len = strlen(line+15); + subtree = malloc(len+1); + memcpy(subtree, line+15, len-1); + subtree[len-1] = '\0'; /* \n */ + commit_subtree = 1; + continue; + } if (!prefixcmp(line, "shallow ")) { unsigned char sha1[20]; struct object *object; @@ -623,7 +652,7 @@ static int send_ref(const char *refname, const unsigned char *sha1, int flag, vo { static const char *capabilities = "multi_ack thin-pack side-band" " side-band-64k ofs-delta shallow no-progress" - " include-tag multi_ack_detailed"; + " include-tag multi_ack_detailed subtree"; struct object *o = parse_object(sha1); if (!o) -- 1.7.1.rc1.69.g24c2f7 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] upload-pack: support subtree packing 2010-07-26 23:36 ` [PATCH 1/2] upload-pack: support subtree packing Nguyễn Thái Ngọc Duy @ 2010-07-27 13:15 ` Ævar Arnfjörð Bjarmason 2010-07-27 14:46 ` Shawn O. Pearce 1 sibling, 0 replies; 10+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2010-07-27 13:15 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy; +Cc: git 2010/7/26 Nguyễn Thái Ngọc Duy <pclouds@gmail.com>: > + int len = strlen(name); Don't you mean "size_t len = strlen(name)" .., or to use a cast? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] upload-pack: support subtree packing 2010-07-26 23:36 ` [PATCH 1/2] upload-pack: support subtree packing Nguyễn Thái Ngọc Duy 2010-07-27 13:15 ` Ævar Arnfjörð Bjarmason @ 2010-07-27 14:46 ` Shawn O. Pearce 2010-07-27 18:51 ` Configurable callbacks for missing objects (we Re: upload-pack: support subtree packing) Avery Pennarun 2010-07-27 22:29 ` [PATCH 1/2] upload-pack: support subtree packing Nguyen Thai Ngoc Duy 1 sibling, 2 replies; 10+ messages in thread From: Shawn O. Pearce @ 2010-07-27 14:46 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy; +Cc: git Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote: > This patch adds a new capability "subtree", which supports two new > requests "subtree" and "commit-subtree". > > "subtree" asks upload-pack to create a pack that contains only blobs > from the given tree prefix (and necessary commits/trees to reach > those blobs). > > "commit-tree" asks upload-pack to create a pack that contains trees of > the given prefix (and necessary commits/trees to reach those trees) > > With "subtree" request, Git client may then rewrite commits to create > a valid commit tree again, so that users can work on it independently. > When users want to push from such a tree, "commit-tree" may then be > used to re-match what users have and what is in upstream, recreate > proper push commits. I disagree with a lot of this... but the idea is quite cool. I like the "subtree" command, being able to clone down only part of the repository is a nice feature, and the implementation of subtree seems simple enough for the server. It only has to emit some of the paths, but the entire commit DAG. This is pretty simple to implement server side and is very lightweight. But I disagree with the client rewriting the commits in order to work with them locally. Doing so means you can't take a commit from your team's issue tracker and look it up. And any commit you create can't be pushed back to the server without rewriting. Its messy for the end-user to work with. I would prefer doing something more like what we do with shallow on the client side. Record in a magic file the path(s) that we did actually obtain. During fsck, rev-list, or read-tree the client skips over any paths that don't match that file's listing. Then we can keep the same commit SHA-1s, but we won't complain that there are objects missing. The downside is, a lot of the client code is impacted, and that is why nobody has done it yet. Tools like rebase or cherry-pick start to behave funny. What does it mean to rebase or cherry-pick a commit that has deltas outside of the area you don't have cloned? It probably should abort and refuse to execute. But `git show` should still work, which implies you need a way to toggle the diff code to either skip or fail on deltas outside of the shallow path space. -- Shawn. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Configurable callbacks for missing objects (we Re: upload-pack: support subtree packing) 2010-07-27 14:46 ` Shawn O. Pearce @ 2010-07-27 18:51 ` Avery Pennarun 2010-07-27 22:32 ` Nguyen Thai Ngoc Duy 2010-07-28 1:53 ` Elijah Newren 2010-07-27 22:29 ` [PATCH 1/2] upload-pack: support subtree packing Nguyen Thai Ngoc Duy 1 sibling, 2 replies; 10+ messages in thread From: Avery Pennarun @ 2010-07-27 18:51 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Nguyễn Thái Ngọc Duy, git On Tue, Jul 27, 2010 at 07:46:05AM -0700, Shawn O. Pearce wrote: > But I disagree with the client rewriting the commits in order to > work with them locally. Doing so means you can't take a commit > from your team's issue tracker and look it up. And any commit > you create can't be pushed back to the server without rewriting. > Its messy for the end-user to work with. Yeah, that doesn't sound ideal. And I wrote git-subtree, which does exactly that, so I should know :) > I would prefer doing something more like what we do with shallow > on the client side. Record in a magic file the path(s) that we > did actually obtain. During fsck, rev-list, or read-tree the > client skips over any paths that don't match that file's listing. > Then we can keep the same commit SHA-1s, but we won't complain that > there are objects missing. Disclaimer: I've never looked at any of the fetch code. But I've been thinking that a really elegant way to solve the problem could be to have a user-configurable "get the missing objects" callback. If any part of git that *needs* an object can't find it, it calls this callback to go try to retrieve it (either just that one object, or it can request to download the object recursively, ie. everything it points to). Then shallow clones could just auto-fill themselves if you really need a prior version, for example. It's also conceivable that we could limit this just to blobs: downloading the complete set of commit objects, and probably even the complete set of tree objects, is probably not that expensive. And that would allow someone to do virtually all operations (other than three-way merges to resolve blob conflicts) without having the entire repo. I say it could be user-configurable because that's where you could plugin a gittorrent, or a tool that just tries fetching from a series of repositories in turn, etc. Have fun, Avery ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Configurable callbacks for missing objects (we Re: upload-pack: support subtree packing) 2010-07-27 18:51 ` Configurable callbacks for missing objects (we Re: upload-pack: support subtree packing) Avery Pennarun @ 2010-07-27 22:32 ` Nguyen Thai Ngoc Duy 2010-07-28 1:53 ` Elijah Newren 1 sibling, 0 replies; 10+ messages in thread From: Nguyen Thai Ngoc Duy @ 2010-07-27 22:32 UTC (permalink / raw) To: Avery Pennarun; +Cc: Shawn O. Pearce, git 2010/7/28 Avery Pennarun <apenwarr@gmail.com>: > But I've been thinking that a really elegant way to solve the problem could > be to have a user-configurable "get the missing objects" callback. If any > part of git that *needs* an object can't find it, it calls this callback to > go try to retrieve it (either just that one object, or it can request to > download the object recursively, ie. everything it points to). > > Then shallow clones could just auto-fill themselves if you really need a > prior version, for example. I think that's what lazy clone does in [1] [1] http://thread.gmane.org/gmane.comp.version-control.git/73117/focus=73935 -- Duy ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Configurable callbacks for missing objects (we Re: upload-pack: support subtree packing) 2010-07-27 18:51 ` Configurable callbacks for missing objects (we Re: upload-pack: support subtree packing) Avery Pennarun 2010-07-27 22:32 ` Nguyen Thai Ngoc Duy @ 2010-07-28 1:53 ` Elijah Newren 2010-07-28 2:00 ` Avery Pennarun 1 sibling, 1 reply; 10+ messages in thread From: Elijah Newren @ 2010-07-28 1:53 UTC (permalink / raw) To: Avery Pennarun; +Cc: Shawn O. Pearce, Nguyễn Thái Ngọc, git 2010/7/27 Avery Pennarun <apenwarr@gmail.com>: > But I've been thinking that a really elegant way to solve the problem could > be to have a user-configurable "get the missing objects" callback. If any > part of git that *needs* an object can't find it, it calls this callback to > go try to retrieve it (either just that one object, or it can request to > download the object recursively, ie. everything it points to). > > Then shallow clones could just auto-fill themselves if you really need a > prior version, for example. What counts as "needing" an object? Does 'git log -Sfoo' or 'git log --stat' need all missing blobs? I'd personally dislike having such commands automatically result in huge downloads, but I'd probably dislike the automatic downloading in general so perhaps I'm just a misfit for the lazy clone usecase. It's still an interesting question though -- what counts as needed? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Configurable callbacks for missing objects (we Re: upload-pack: support subtree packing) 2010-07-28 1:53 ` Elijah Newren @ 2010-07-28 2:00 ` Avery Pennarun 0 siblings, 0 replies; 10+ messages in thread From: Avery Pennarun @ 2010-07-28 2:00 UTC (permalink / raw) To: Elijah Newren; +Cc: Shawn O. Pearce, Nguyễn Thái Ngọc, git 2010/7/27 Elijah Newren <newren@gmail.com>: > 2010/7/27 Avery Pennarun <apenwarr@gmail.com>: >> But I've been thinking that a really elegant way to solve the problem could >> be to have a user-configurable "get the missing objects" callback. If any >> part of git that *needs* an object can't find it, it calls this callback to >> go try to retrieve it (either just that one object, or it can request to >> download the object recursively, ie. everything it points to). >> >> Then shallow clones could just auto-fill themselves if you really need a >> prior version, for example. > > What counts as "needing" an object? Does 'git log -Sfoo' or 'git log > --stat' need all missing blobs? I'd personally dislike having such > commands automatically result in huge downloads, but I'd probably > dislike the automatic downloading in general so perhaps I'm just a > misfit for the lazy clone usecase. It's still an interesting question > though -- what counts as needed? I would say no by default (unless maybe a config option is set) but you'd want to be able to force it on for a particular command. Have fun, Avery ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] upload-pack: support subtree packing 2010-07-27 14:46 ` Shawn O. Pearce 2010-07-27 18:51 ` Configurable callbacks for missing objects (we Re: upload-pack: support subtree packing) Avery Pennarun @ 2010-07-27 22:29 ` Nguyen Thai Ngoc Duy 1 sibling, 0 replies; 10+ messages in thread From: Nguyen Thai Ngoc Duy @ 2010-07-27 22:29 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: git On Wed, Jul 28, 2010 at 12:46 AM, Shawn O. Pearce <spearce@spearce.org> wrote: > Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote: >> This patch adds a new capability "subtree", which supports two new >> requests "subtree" and "commit-subtree". >> >> "subtree" asks upload-pack to create a pack that contains only blobs >> from the given tree prefix (and necessary commits/trees to reach >> those blobs). >> >> "commit-tree" asks upload-pack to create a pack that contains trees of >> the given prefix (and necessary commits/trees to reach those trees) >> >> With "subtree" request, Git client may then rewrite commits to create >> a valid commit tree again, so that users can work on it independently. >> When users want to push from such a tree, "commit-tree" may then be >> used to re-match what users have and what is in upstream, recreate >> proper push commits. > > I disagree with a lot of this... but the idea is quite cool. > > I like the "subtree" command, being able to clone down only part of > the repository is a nice feature, and the implementation of subtree > seems simple enough for the server. It only has to emit some of > the paths, but the entire commit DAG. This is pretty simple to > implement server side and is very lightweight. Another point is server side can disallow full clone completely and give permission to clone on directory basis. Enterprise users would love this. > But I disagree with the client rewriting the commits in order to > work with them locally. Doing so means you can't take a commit > from your team's issue tracker and look it up. And any commit > you create can't be pushed back to the server without rewriting. > Its messy for the end-user to work with. That's what happens with git-subtree in its current form (I don't know much about git-subtree though). But I guess if they can use git-subtree as it is now, they can live with subtree clone+git-subtree just fine. > I would prefer doing something more like what we do with shallow > on the client side. Record in a magic file the path(s) that we > did actually obtain. During fsck, rev-list, or read-tree the > client skips over any paths that don't match that file's listing. > Then we can keep the same commit SHA-1s, but we won't complain that > there are objects missing. That's another option. With all trees, sparse checkout can be used, as long as you limit your operations within a subdirectory. Full tree commands like git-fsck can be taught to realize it's subtree clone and stop complain of non-existing objects. Download pack would be bigger (I don't know how much). And it also defeats the enterprise point above. > The downside is, a lot of the client code is impacted, and that > is why nobody has done it yet. Tools like rebase or cherry-pick > start to behave funny. What does it mean to rebase or cherry-pick > a commit that has deltas outside of the area you don't have cloned? > It probably should abort and refuse to execute. But `git show` > should still work, which implies you need a way to toggle the > diff code to either skip or fail on deltas outside of the shallow > path space. Where do those deltas come from? I thought, with proper path limiting in upload-pack, pack-objects would never generate anything that needs things outside the area? Sounds like git-subtree for short term, and without git-subtree long term to me :) -- Duy ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] fetch-pack: support --subtree and --commit-subtree options 2010-07-26 23:36 [PATCH 0/2] Subtree clone? Nguyễn Thái Ngọc Duy 2010-07-26 23:36 ` [PATCH 1/2] upload-pack: support subtree packing Nguyễn Thái Ngọc Duy @ 2010-07-26 23:36 ` Nguyễn Thái Ngọc Duy 1 sibling, 0 replies; 10+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2010-07-26 23:36 UTC (permalink / raw) To: git; +Cc: Nguyễn Thái Ngọc Duy These options are simply turned to upload-pack's "subtree" and "commit-tree" requests, respectively. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- builtin/fetch-pack.c | 15 +++++++++++++++ 1 files changed, 15 insertions(+), 0 deletions(-) diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c index dbd8b7b..0bc7f6d 100644 --- a/builtin/fetch-pack.c +++ b/builtin/fetch-pack.c @@ -14,6 +14,8 @@ static int transfer_unpack_limit = -1; static int fetch_unpack_limit = -1; static int unpack_limit = 100; static int prefer_ofs_delta = 1; +static const char *subtree; +static int commit_subtree; static struct fetch_pack_args args = { /* .uploadpack = */ "git-upload-pack", }; @@ -237,6 +239,8 @@ static int find_common(int fd[2], unsigned char *result_sha1, for_each_ref(rev_list_insert_ref, NULL); fetching = 0; + if (subtree) + packet_buf_write(&req_buf, "%s %s\n", commit_subtree ? "commit-subtree" : "subtree", subtree); for ( ; refs ; refs = refs->next) { unsigned char *remote = refs->old_sha1; const char *remote_hex; @@ -692,6 +696,8 @@ static struct ref *do_fetch_pack(int fd[2], if (is_repository_shallow() && !server_supports("shallow")) die("Server does not support shallow clients"); + if (subtree && !server_supports("subtree")) + die("Server does not support subtree"); if (server_supports("multi_ack_detailed")) { if (args.verbose) fprintf(stderr, "Server supports multi_ack_detailed\n"); @@ -860,6 +866,15 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix) pack_lockfile_ptr = &pack_lockfile; continue; } + if (!prefixcmp(arg, "--subtree=")) { + subtree = arg + 10; + continue; + } + if (!prefixcmp(arg, "--commit-subtree=")) { + subtree = arg + 17; + commit_subtree = 1; + continue; + } usage(fetch_pack_usage); } dest = (char *)arg; -- 1.7.1.rc1.69.g24c2f7 ^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2010-07-28 2:01 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-07-26 23:36 [PATCH 0/2] Subtree clone? Nguyễn Thái Ngọc Duy 2010-07-26 23:36 ` [PATCH 1/2] upload-pack: support subtree packing Nguyễn Thái Ngọc Duy 2010-07-27 13:15 ` Ævar Arnfjörð Bjarmason 2010-07-27 14:46 ` Shawn O. Pearce 2010-07-27 18:51 ` Configurable callbacks for missing objects (we Re: upload-pack: support subtree packing) Avery Pennarun 2010-07-27 22:32 ` Nguyen Thai Ngoc Duy 2010-07-28 1:53 ` Elijah Newren 2010-07-28 2:00 ` Avery Pennarun 2010-07-27 22:29 ` [PATCH 1/2] upload-pack: support subtree packing Nguyen Thai Ngoc Duy 2010-07-26 23:36 ` [PATCH 2/2] fetch-pack: support --subtree and --commit-subtree options Nguyễn Thái Ngọc Duy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).