git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Subtree clone?
@ 2010-07-26 23:36 Nguyễn Thái Ngọc Duy
  2010-07-26 23:36 ` [PATCH 1/2] upload-pack: support subtree packing Nguyễn Thái Ngọc Duy
  2010-07-26 23:36 ` [PATCH 2/2] fetch-pack: support --subtree and --commit-subtree options Nguyễn Thái Ngọc Duy
  0 siblings, 2 replies; 10+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-07-26 23:36 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This idea sounds quite nice to me. That is, instead of modifying git
core to support narrow/partial clone, {upload,fetch}-pack is modified
to give clients enough objects of so it can reconstruct a valid tree.
Users are free to do whatever they want on that tree. When they want to
push changes back, Git client creates proper commit/tree objects and
push.

The two patches in this series allow git to send objects of a subtree
to client, or just a barebone subtree without blobs. The client can
rewrite commits and throw the old commits.

I don't want to add much computation to server side, subtree looks
like a good fit (i.e. simply prefixcmp). Sparse checkout can then be
used to shape worktree if subtree is not good enough.

All the hard work is at client side, and git-subtree is a good
candidate. Well, the idea is inspired by recent discussions of
git-subtree vs git-submodule anyway.

Lazy clone does something similar. However lazy clone requires
connectivity to upstream. Lazy clone also exposes a security issue,
allowing client to get any object it wants.

Comments?

Nguyễn Thái Ngọc Duy (2):
  upload-pack: support subtree packing
  fetch-pack: support --subtree and --commit-subtree options

 builtin/fetch-pack.c |   15 +++++++++++++++
 upload-pack.c        |   31 ++++++++++++++++++++++++++++++-
 2 files changed, 45 insertions(+), 1 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] upload-pack: support subtree packing
  2010-07-26 23:36 [PATCH 0/2] Subtree clone? Nguyễn Thái Ngọc Duy
@ 2010-07-26 23:36 ` Nguyễn Thái Ngọc Duy
  2010-07-27 13:15   ` Ævar Arnfjörð Bjarmason
  2010-07-27 14:46   ` Shawn O. Pearce
  2010-07-26 23:36 ` [PATCH 2/2] fetch-pack: support --subtree and --commit-subtree options Nguyễn Thái Ngọc Duy
  1 sibling, 2 replies; 10+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-07-26 23:36 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This patch adds a new capability "subtree", which supports two new
requests "subtree" and "commit-subtree".

"subtree" asks upload-pack to create a pack that contains only blobs
from the given tree prefix (and necessary commits/trees to reach
those blobs).

"commit-tree" asks upload-pack to create a pack that contains trees of
the given prefix (and necessary commits/trees to reach those trees)

With "subtree" request, Git client may then rewrite commits to create
a valid commit tree again, so that users can work on it independently.
When users want to push from such a tree, "commit-tree" may then be
used to re-match what users have and what is in upstream, recreate
proper push commits.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 upload-pack.c |   31 ++++++++++++++++++++++++++++++-
 1 files changed, 30 insertions(+), 1 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index dc464d7..f97296a 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -41,6 +41,8 @@ static int use_sideband;
 static int debug_fd;
 static int advertise_refs;
 static int stateless_rpc;
+static char *subtree;
+static int commit_subtree;
 
 static void reset_timeout(void)
 {
@@ -89,6 +91,17 @@ static void show_object(struct object *obj, const struct name_path *path, const
 	 */
 	const char *name = path_name(path, component);
 	const char *ep = strchr(name, '\n');
+	if (subtree) {
+		int len = strlen(name);
+		/* parent trees should always be kept */
+		if (obj->type == OBJ_TREE && !prefixcmp(subtree, name) && subtree[len] == '/')
+			; /* in */
+		else if (commit_subtree)
+			goto out;
+		else if (prefixcmp(name, subtree))
+			goto out;
+	}
+
 	if (ep) {
 		fprintf(pack_pipe, "%s %.*s\n", sha1_to_hex(obj->sha1),
 		       (int) (ep - name),
@@ -97,6 +110,7 @@ static void show_object(struct object *obj, const struct name_path *path, const
 	else
 		fprintf(pack_pipe, "%s %s\n",
 				sha1_to_hex(obj->sha1), name);
+out:
 	free((char *)name);
 }
 
@@ -504,6 +518,21 @@ static void receive_needs(void)
 		if (debug_fd)
 			write_in_full(debug_fd, line, len);
 
+		if (!prefixcmp(line, "subtree ")) {
+			int len = strlen(line+8);
+			subtree = malloc(len+1);
+			memcpy(subtree, line+8, len-1);
+			subtree[len-1] = '\0'; /* \n */
+			continue;
+		}
+		if (!prefixcmp(line, "commit-subtree ")) {
+			int len = strlen(line+15);
+			subtree = malloc(len+1);
+			memcpy(subtree, line+15, len-1);
+			subtree[len-1] = '\0'; /* \n */
+			commit_subtree = 1;
+			continue;
+		}
 		if (!prefixcmp(line, "shallow ")) {
 			unsigned char sha1[20];
 			struct object *object;
@@ -623,7 +652,7 @@ static int send_ref(const char *refname, const unsigned char *sha1, int flag, vo
 {
 	static const char *capabilities = "multi_ack thin-pack side-band"
 		" side-band-64k ofs-delta shallow no-progress"
-		" include-tag multi_ack_detailed";
+		" include-tag multi_ack_detailed subtree";
 	struct object *o = parse_object(sha1);
 
 	if (!o)
-- 
1.7.1.rc1.69.g24c2f7

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] fetch-pack: support --subtree and --commit-subtree options
  2010-07-26 23:36 [PATCH 0/2] Subtree clone? Nguyễn Thái Ngọc Duy
  2010-07-26 23:36 ` [PATCH 1/2] upload-pack: support subtree packing Nguyễn Thái Ngọc Duy
@ 2010-07-26 23:36 ` Nguyễn Thái Ngọc Duy
  1 sibling, 0 replies; 10+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-07-26 23:36 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

These options are simply turned to upload-pack's "subtree" and "commit-tree" requests, respectively.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/fetch-pack.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index dbd8b7b..0bc7f6d 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -14,6 +14,8 @@ static int transfer_unpack_limit = -1;
 static int fetch_unpack_limit = -1;
 static int unpack_limit = 100;
 static int prefer_ofs_delta = 1;
+static const char *subtree;
+static int commit_subtree;
 static struct fetch_pack_args args = {
 	/* .uploadpack = */ "git-upload-pack",
 };
@@ -237,6 +239,8 @@ static int find_common(int fd[2], unsigned char *result_sha1,
 	for_each_ref(rev_list_insert_ref, NULL);
 
 	fetching = 0;
+	if (subtree)
+		packet_buf_write(&req_buf, "%s %s\n", commit_subtree ? "commit-subtree" : "subtree", subtree);
 	for ( ; refs ; refs = refs->next) {
 		unsigned char *remote = refs->old_sha1;
 		const char *remote_hex;
@@ -692,6 +696,8 @@ static struct ref *do_fetch_pack(int fd[2],
 
 	if (is_repository_shallow() && !server_supports("shallow"))
 		die("Server does not support shallow clients");
+	if (subtree && !server_supports("subtree"))
+		die("Server does not support subtree");
 	if (server_supports("multi_ack_detailed")) {
 		if (args.verbose)
 			fprintf(stderr, "Server supports multi_ack_detailed\n");
@@ -860,6 +866,15 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 				pack_lockfile_ptr = &pack_lockfile;
 				continue;
 			}
+			if (!prefixcmp(arg, "--subtree=")) {
+				subtree = arg + 10;
+				continue;
+			}
+			if (!prefixcmp(arg, "--commit-subtree=")) {
+				subtree = arg + 17;
+				commit_subtree = 1;
+				continue;
+			}
 			usage(fetch_pack_usage);
 		}
 		dest = (char *)arg;
-- 
1.7.1.rc1.69.g24c2f7

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] upload-pack: support subtree packing
  2010-07-26 23:36 ` [PATCH 1/2] upload-pack: support subtree packing Nguyễn Thái Ngọc Duy
@ 2010-07-27 13:15   ` Ævar Arnfjörð Bjarmason
  2010-07-27 14:46   ` Shawn O. Pearce
  1 sibling, 0 replies; 10+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-07-27 13:15 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

2010/7/26 Nguyễn Thái Ngọc Duy <pclouds@gmail.com>:
> +               int len = strlen(name);

Don't you mean "size_t len = strlen(name)" .., or to use a cast?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] upload-pack: support subtree packing
  2010-07-26 23:36 ` [PATCH 1/2] upload-pack: support subtree packing Nguyễn Thái Ngọc Duy
  2010-07-27 13:15   ` Ævar Arnfjörð Bjarmason
@ 2010-07-27 14:46   ` Shawn O. Pearce
  2010-07-27 18:51     ` Configurable callbacks for missing objects (we Re: upload-pack: support subtree packing) Avery Pennarun
  2010-07-27 22:29     ` [PATCH 1/2] upload-pack: support subtree packing Nguyen Thai Ngoc Duy
  1 sibling, 2 replies; 10+ messages in thread
From: Shawn O. Pearce @ 2010-07-27 14:46 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> This patch adds a new capability "subtree", which supports two new
> requests "subtree" and "commit-subtree".
> 
> "subtree" asks upload-pack to create a pack that contains only blobs
> from the given tree prefix (and necessary commits/trees to reach
> those blobs).
> 
> "commit-tree" asks upload-pack to create a pack that contains trees of
> the given prefix (and necessary commits/trees to reach those trees)
> 
> With "subtree" request, Git client may then rewrite commits to create
> a valid commit tree again, so that users can work on it independently.
> When users want to push from such a tree, "commit-tree" may then be
> used to re-match what users have and what is in upstream, recreate
> proper push commits.

I disagree with a lot of this... but the idea is quite cool.

I like the "subtree" command, being able to clone down only part of
the repository is a nice feature, and the implementation of subtree
seems simple enough for the server.  It only has to emit some of
the paths, but the entire commit DAG.  This is pretty simple to
implement server side and is very lightweight.


But I disagree with the client rewriting the commits in order to
work with them locally.  Doing so means you can't take a commit
from your team's issue tracker and look it up.  And any commit
you create can't be pushed back to the server without rewriting.
Its messy for the end-user to work with.

I would prefer doing something more like what we do with shallow
on the client side.  Record in a magic file the path(s) that we
did actually obtain.  During fsck, rev-list, or read-tree the
client skips over any paths that don't match that file's listing.
Then we can keep the same commit SHA-1s, but we won't complain that
there are objects missing.

The downside is, a lot of the client code is impacted, and that
is why nobody has done it yet.  Tools like rebase or cherry-pick
start to behave funny.  What does it mean to rebase or cherry-pick
a commit that has deltas outside of the area you don't have cloned?
It probably should abort and refuse to execute.  But `git show`
should still work, which implies you need a way to toggle the
diff code to either skip or fail on deltas outside of the shallow
path space.
 
-- 
Shawn.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Configurable callbacks for missing objects (we Re: upload-pack: support subtree packing)
  2010-07-27 14:46   ` Shawn O. Pearce
@ 2010-07-27 18:51     ` Avery Pennarun
  2010-07-27 22:32       ` Nguyen Thai Ngoc Duy
  2010-07-28  1:53       ` Elijah Newren
  2010-07-27 22:29     ` [PATCH 1/2] upload-pack: support subtree packing Nguyen Thai Ngoc Duy
  1 sibling, 2 replies; 10+ messages in thread
From: Avery Pennarun @ 2010-07-27 18:51 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Nguyễn Thái Ngọc Duy, git

On Tue, Jul 27, 2010 at 07:46:05AM -0700, Shawn O. Pearce wrote:

> But I disagree with the client rewriting the commits in order to
> work with them locally.  Doing so means you can't take a commit
> from your team's issue tracker and look it up.  And any commit
> you create can't be pushed back to the server without rewriting.
> Its messy for the end-user to work with.

Yeah, that doesn't sound ideal.  And I wrote git-subtree, which does exactly
that, so I should know :)

> I would prefer doing something more like what we do with shallow
> on the client side.  Record in a magic file the path(s) that we
> did actually obtain.  During fsck, rev-list, or read-tree the
> client skips over any paths that don't match that file's listing.
> Then we can keep the same commit SHA-1s, but we won't complain that
> there are objects missing.

Disclaimer: I've never looked at any of the fetch code.

But I've been thinking that a really elegant way to solve the problem could
be to have a user-configurable "get the missing objects" callback.  If any
part of git that *needs* an object can't find it, it calls this callback to
go try to retrieve it (either just that one object, or it can request to
download the object recursively, ie. everything it points to).

Then shallow clones could just auto-fill themselves if you really need a
prior version, for example.

It's also conceivable that we could limit this just to blobs: downloading
the complete set of commit objects, and probably even the complete set of
tree objects, is probably not that expensive.  And that would allow someone
to do virtually all operations (other than three-way merges to resolve blob
conflicts) without having the entire repo.

I say it could be user-configurable because that's where you could plugin a
gittorrent, or a tool that just tries fetching from a series of repositories in
turn, etc.

Have fun,

Avery

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] upload-pack: support subtree packing
  2010-07-27 14:46   ` Shawn O. Pearce
  2010-07-27 18:51     ` Configurable callbacks for missing objects (we Re: upload-pack: support subtree packing) Avery Pennarun
@ 2010-07-27 22:29     ` Nguyen Thai Ngoc Duy
  1 sibling, 0 replies; 10+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-07-27 22:29 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

On Wed, Jul 28, 2010 at 12:46 AM, Shawn O. Pearce <spearce@spearce.org> wrote:
> Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
>> This patch adds a new capability "subtree", which supports two new
>> requests "subtree" and "commit-subtree".
>>
>> "subtree" asks upload-pack to create a pack that contains only blobs
>> from the given tree prefix (and necessary commits/trees to reach
>> those blobs).
>>
>> "commit-tree" asks upload-pack to create a pack that contains trees of
>> the given prefix (and necessary commits/trees to reach those trees)
>>
>> With "subtree" request, Git client may then rewrite commits to create
>> a valid commit tree again, so that users can work on it independently.
>> When users want to push from such a tree, "commit-tree" may then be
>> used to re-match what users have and what is in upstream, recreate
>> proper push commits.
>
> I disagree with a lot of this... but the idea is quite cool.
>
> I like the "subtree" command, being able to clone down only part of
> the repository is a nice feature, and the implementation of subtree
> seems simple enough for the server.  It only has to emit some of
> the paths, but the entire commit DAG.  This is pretty simple to
> implement server side and is very lightweight.

Another point is server side can disallow full clone completely and
give permission to clone on directory basis. Enterprise users would
love this.

> But I disagree with the client rewriting the commits in order to
> work with them locally.  Doing so means you can't take a commit
> from your team's issue tracker and look it up.  And any commit
> you create can't be pushed back to the server without rewriting.
> Its messy for the end-user to work with.

That's what happens with git-subtree in its current form (I don't know
much about git-subtree though). But I guess if they can use
git-subtree as it is now, they can live with subtree clone+git-subtree
just fine.

> I would prefer doing something more like what we do with shallow
> on the client side.  Record in a magic file the path(s) that we
> did actually obtain.  During fsck, rev-list, or read-tree the
> client skips over any paths that don't match that file's listing.
> Then we can keep the same commit SHA-1s, but we won't complain that
> there are objects missing.

That's another option. With all trees, sparse checkout can be used, as
long as you limit your operations within a subdirectory. Full tree
commands like git-fsck can be taught to realize it's subtree clone and
stop complain of non-existing objects. Download pack would be bigger
(I don't know how much). And it also defeats the enterprise point
above.

> The downside is, a lot of the client code is impacted, and that
> is why nobody has done it yet.  Tools like rebase or cherry-pick
> start to behave funny.  What does it mean to rebase or cherry-pick
> a commit that has deltas outside of the area you don't have cloned?
> It probably should abort and refuse to execute.  But `git show`
> should still work, which implies you need a way to toggle the
> diff code to either skip or fail on deltas outside of the shallow
> path space.

Where do those deltas come from? I thought, with proper path limiting
in upload-pack, pack-objects would never generate anything that needs
things outside the area?

Sounds like git-subtree for short term, and without git-subtree long
term to me :)
-- 
Duy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Configurable callbacks for missing objects (we Re: upload-pack:  support subtree packing)
  2010-07-27 18:51     ` Configurable callbacks for missing objects (we Re: upload-pack: support subtree packing) Avery Pennarun
@ 2010-07-27 22:32       ` Nguyen Thai Ngoc Duy
  2010-07-28  1:53       ` Elijah Newren
  1 sibling, 0 replies; 10+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-07-27 22:32 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Shawn O. Pearce, git

2010/7/28 Avery Pennarun <apenwarr@gmail.com>:
> But I've been thinking that a really elegant way to solve the problem could
> be to have a user-configurable "get the missing objects" callback.  If any
> part of git that *needs* an object can't find it, it calls this callback to
> go try to retrieve it (either just that one object, or it can request to
> download the object recursively, ie. everything it points to).
>
> Then shallow clones could just auto-fill themselves if you really need a
> prior version, for example.

I think that's what lazy clone does in [1]

[1] http://thread.gmane.org/gmane.comp.version-control.git/73117/focus=73935
-- 
Duy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Configurable callbacks for missing objects (we Re: upload-pack:  support subtree packing)
  2010-07-27 18:51     ` Configurable callbacks for missing objects (we Re: upload-pack: support subtree packing) Avery Pennarun
  2010-07-27 22:32       ` Nguyen Thai Ngoc Duy
@ 2010-07-28  1:53       ` Elijah Newren
  2010-07-28  2:00         ` Avery Pennarun
  1 sibling, 1 reply; 10+ messages in thread
From: Elijah Newren @ 2010-07-28  1:53 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Shawn O. Pearce, Nguyễn Thái Ngọc, git

2010/7/27 Avery Pennarun <apenwarr@gmail.com>:
> But I've been thinking that a really elegant way to solve the problem could
> be to have a user-configurable "get the missing objects" callback.  If any
> part of git that *needs* an object can't find it, it calls this callback to
> go try to retrieve it (either just that one object, or it can request to
> download the object recursively, ie. everything it points to).
>
> Then shallow clones could just auto-fill themselves if you really need a
> prior version, for example.

What counts as "needing" an object?  Does 'git log -Sfoo' or 'git log
--stat' need all missing blobs?  I'd personally dislike having such
commands automatically result in huge downloads, but I'd probably
dislike the automatic downloading in general so perhaps I'm just a
misfit for the lazy clone usecase.  It's still an interesting question
though -- what counts as needed?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Configurable callbacks for missing objects (we Re: upload-pack:  support subtree packing)
  2010-07-28  1:53       ` Elijah Newren
@ 2010-07-28  2:00         ` Avery Pennarun
  0 siblings, 0 replies; 10+ messages in thread
From: Avery Pennarun @ 2010-07-28  2:00 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Shawn O. Pearce, Nguyễn Thái Ngọc, git

2010/7/27 Elijah Newren <newren@gmail.com>:
> 2010/7/27 Avery Pennarun <apenwarr@gmail.com>:
>> But I've been thinking that a really elegant way to solve the problem could
>> be to have a user-configurable "get the missing objects" callback.  If any
>> part of git that *needs* an object can't find it, it calls this callback to
>> go try to retrieve it (either just that one object, or it can request to
>> download the object recursively, ie. everything it points to).
>>
>> Then shallow clones could just auto-fill themselves if you really need a
>> prior version, for example.
>
> What counts as "needing" an object?  Does 'git log -Sfoo' or 'git log
> --stat' need all missing blobs?  I'd personally dislike having such
> commands automatically result in huge downloads, but I'd probably
> dislike the automatic downloading in general so perhaps I'm just a
> misfit for the lazy clone usecase.  It's still an interesting question
> though -- what counts as needed?

I would say no by default (unless maybe a config option is set) but
you'd want to be able to force it on for a particular command.

Have fun,

Avery

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-07-28  2:01 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-26 23:36 [PATCH 0/2] Subtree clone? Nguyễn Thái Ngọc Duy
2010-07-26 23:36 ` [PATCH 1/2] upload-pack: support subtree packing Nguyễn Thái Ngọc Duy
2010-07-27 13:15   ` Ævar Arnfjörð Bjarmason
2010-07-27 14:46   ` Shawn O. Pearce
2010-07-27 18:51     ` Configurable callbacks for missing objects (we Re: upload-pack: support subtree packing) Avery Pennarun
2010-07-27 22:32       ` Nguyen Thai Ngoc Duy
2010-07-28  1:53       ` Elijah Newren
2010-07-28  2:00         ` Avery Pennarun
2010-07-27 22:29     ` [PATCH 1/2] upload-pack: support subtree packing Nguyen Thai Ngoc Duy
2010-07-26 23:36 ` [PATCH 2/2] fetch-pack: support --subtree and --commit-subtree options Nguyễn Thái Ngọc Duy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).