git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] archive: let remote clients get reachable commits
@ 2013-02-21 14:24 Sergey Segeev
  2013-02-21 15:52 ` Jeff King
  0 siblings, 1 reply; 7+ messages in thread
From: Sergey Segeev @ 2013-02-21 14:24 UTC (permalink / raw)
  To: git, Jeff King, Junio C Hamano; +Cc: Sergey Segeev

Some time we need to get valid commit without a ref but with proper
tree-ish, now we can't do that.

This patch allow upload-archive's to use reachability checking
rather than checking that is a ref. This means a remote client can
fetch a tip of any valid sha1 or tree-ish.
---
 archive.c           | 24 +++++++++++++-----------
 t/t5000-tar-tree.sh |  2 ++
 2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/archive.c b/archive.c
index 93e00bb..0a48985 100644
--- a/archive.c
+++ b/archive.c
@@ -5,6 +5,7 @@
 #include "archive.h"
 #include "parse-options.h"
 #include "unpack-trees.h"
+#include "refs.h"
 
 static char const * const archive_usage[] = {
 	N_("git archive [options] <tree-ish> [<path>...]"),
@@ -241,6 +242,13 @@ static void parse_pathspec_arg(const char **pathspec,
 	}
 }
 
+static int check_reachable(const char *refname, const unsigned char *sha1,
+		int flag, void *cb_data)
+{
+
+	return in_merge_bases(cb_data, lookup_commit_reference_gently(sha1, 1));
+}
+
 static void parse_treeish_arg(const char **argv,
 		struct archiver_args *ar_args, const char *prefix,
 		int remote)
@@ -252,22 +260,16 @@ static void parse_treeish_arg(const char **argv,
 	const struct commit *commit;
 	unsigned char sha1[20];
 
-	/* Remotes are only allowed to fetch actual refs */
-	if (remote) {
-		char *ref = NULL;
-		const char *colon = strchr(name, ':');
-		int refnamelen = colon ? colon - name : strlen(name);
-
-		if (!dwim_ref(name, refnamelen, sha1, &ref))
-			die("no such ref: %.*s", refnamelen, name);
-		free(ref);
-	}
-
 	if (get_sha1(name, sha1))
 		die("Not a valid object name");
 
 	commit = lookup_commit_reference_gently(sha1, 1);
 	if (commit) {
+
+		/* Remotes are only allowed to fetch actual objects */
+		if (remote && !for_each_ref(check_reachable, (void *)commit))
+			die("Not a valid object name");
+
 		commit_sha1 = commit->object.sha1;
 		archive_time = commit->date;
 	} else {
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index e7c240f..fc35406 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -194,6 +194,8 @@ test_expect_success 'clients cannot access unreachable commits' '
 	sha1=`git rev-parse HEAD` &&
 	git reset --hard HEAD^ &&
 	git archive $sha1 >remote.tar &&
+	git archive --remote=. $sha1 >remote.tar &&
+	git tag -d unreachable &&
 	test_must_fail git archive --remote=. $sha1 >remote.tar
 '
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] archive: let remote clients get reachable commits
  2013-02-21 14:24 [PATCH] archive: let remote clients get reachable commits Sergey Segeev
@ 2013-02-21 15:52 ` Jeff King
       [not found]   ` <995301361532360@web22h.yandex.ru>
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2013-02-21 15:52 UTC (permalink / raw)
  To: Sergey Segeev; +Cc: git, Junio C Hamano

On Thu, Feb 21, 2013 at 06:24:03PM +0400, Sergey Segeev wrote:

> Some time we need to get valid commit without a ref but with proper
> tree-ish, now we can't do that.
> 
> This patch allow upload-archive's to use reachability checking
> rather than checking that is a ref. This means a remote client can
> fetch a tip of any valid sha1 or tree-ish.

That sounds like a good goal, but...

> @@ -252,22 +260,16 @@ static void parse_treeish_arg(const char **argv,
>  	const struct commit *commit;
>  	unsigned char sha1[20];
>  
> -	/* Remotes are only allowed to fetch actual refs */
> -	if (remote) {
> -		char *ref = NULL;
> -		const char *colon = strchr(name, ':');
> -		int refnamelen = colon ? colon - name : strlen(name);
> -
> -		if (!dwim_ref(name, refnamelen, sha1, &ref))
> -			die("no such ref: %.*s", refnamelen, name);
> -		free(ref);
> -	}

The point of this was to allow "commit:path" syntax, and check that
commit pointed to a ref. The natural extension would be to also check
that the commit part is reachable.

I think it is also not sufficient to just check whether the left-hand
side of the colon is a reachable commit. You would also want to handle
non-commits which are directly pointed-to by a ref or its tag (e.g.,
think of a tag pointing directly to a tree, like the v2.6.11 tag in the
linux repo).

Your check...

>  	commit = lookup_commit_reference_gently(sha1, 1);
>  	if (commit) {
> +
> +		/* Remotes are only allowed to fetch actual objects */
> +		if (remote && !for_each_ref(check_reachable, (void *)commit))
> +			die("Not a valid object name");
> +
>  		commit_sha1 = commit->object.sha1;
>  		archive_time = commit->date;
>  	} else {

...will do nothing if we do not have a commit reference (e.g., an arbitrary
sha1, or commit:path syntax). We follow the "else" of this branch, and
allow arbitrary sha1's to be fetched (like "unreachable_sha1:subdir").

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] archive: let remote clients get reachable commits
       [not found]   ` <995301361532360@web22h.yandex.ru>
@ 2013-02-22 17:10     ` Junio C Hamano
  2013-02-22 17:27       ` Jeff King
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2013-02-22 17:10 UTC (permalink / raw)
  To: Sergey Sergeev; +Cc: Jeff King, git@vger.kernel.org

Sergey Sergeev <gurugray@yandex.ru> writes:

[jc: please do not top-post]

> You are right,
> I'll rethink this patch and write some test for this cases.

Thanks.

Note that this is harder to implement than one would naïvely think,
if one aims for a very generic solution, without walking the whole
history.

I personally think that it is OK to limit the scope to expressions
that start from the tip of ref and expressions that start with the
SHA-1 at the tip of ref, e.g.

    master~12:Documentation
    v2.6.11:arch/alpha
    5dc01c595e6c6ec9ccda	# tag v2.6.11
    26791a8bcf0e6d33f43a:arch	# tag v2.6.12
    26791a8bcf0~12:arch		# starting at 26791a8b and dig down

are OK, while forbidding the following:

    c39ae07f393806ccf406        # tree of tag v2.6.11
    9ee1c939d1cb936b1f98	# commit v2.6.12^0
    9ee1c939d1cb936b1f98:	# tree of commit v2.6.12^0
    9ee1c939d1cb936b1f98:arch	# subtree of commit v2.6.12^0

which will make it significantly easier to implement the necessary
validation in a robust way.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] archive: let remote clients get reachable commits
  2013-02-22 17:10     ` Junio C Hamano
@ 2013-02-22 17:27       ` Jeff King
  2013-02-22 18:06         ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2013-02-22 17:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sergey Sergeev, git@vger.kernel.org

On Fri, Feb 22, 2013 at 09:10:46AM -0800, Junio C Hamano wrote:

> I personally think that it is OK to limit the scope to expressions
> that start from the tip of ref and expressions that start with the
> SHA-1 at the tip of ref, e.g.
> 
>     master~12:Documentation
>     v2.6.11:arch/alpha
>     5dc01c595e6c6ec9ccda	# tag v2.6.11
>     26791a8bcf0e6d33f43a:arch	# tag v2.6.12
>     26791a8bcf0~12:arch		# starting at 26791a8b and dig down
> 
> are OK, while forbidding the following:
> 
>     c39ae07f393806ccf406        # tree of tag v2.6.11
>     9ee1c939d1cb936b1f98	# commit v2.6.12^0
>     9ee1c939d1cb936b1f98:	# tree of commit v2.6.12^0
>     9ee1c939d1cb936b1f98:arch	# subtree of commit v2.6.12^0
> 
> which will make it significantly easier to implement the necessary
> validation in a robust way.

How are you proposing to verify master~12 in that example? Because
during parsing, it starts with "master", and we remember that? Or
because you are doing a reachability traversal on all refs after we
parse?

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] archive: let remote clients get reachable commits
  2013-02-22 17:27       ` Jeff King
@ 2013-02-22 18:06         ` Junio C Hamano
  2013-02-22 18:26           ` Jeff King
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2013-02-22 18:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Sergey Sergeev, git@vger.kernel.org

Jeff King <peff@peff.net> writes:

> How are you proposing to verify master~12 in that example? Because
> during parsing, it starts with "master", and we remember that?

By not cheating (i.e. using get_sha1()), but making sure you can
parse "master" and the adornment on it "~12" is something sane.

That is why I said "this is harder than one would naively think, but
limiting will make it significantly easier".  I didn't say that it
would become "trivial", did I?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] archive: let remote clients get reachable commits
  2013-02-22 18:06         ` Junio C Hamano
@ 2013-02-22 18:26           ` Jeff King
  2013-02-22 19:15             ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2013-02-22 18:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sergey Sergeev, git@vger.kernel.org

On Fri, Feb 22, 2013 at 10:06:56AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > How are you proposing to verify master~12 in that example? Because
> > during parsing, it starts with "master", and we remember that?
> 
> By not cheating (i.e. using get_sha1()), but making sure you can
> parse "master" and the adornment on it "~12" is something sane.

So, like these patches:

  http://article.gmane.org/gmane.comp.version-control.git/188386

  http://article.gmane.org/gmane.comp.version-control.git/188387

? They do not allow arbitrary sha1s that happen to point to branch tips,
but I am not sure whether that is something people care about or not.

> That is why I said "this is harder than one would naively think, but
> limiting will make it significantly easier".  I didn't say that it
> would become "trivial", did I?

I'm not implying it would be trivial. It was an honest question, since
you did not seem to want to do the pass-more-information-out-of-get-sha1
approach last time this came up.

Even though those patches above are from me, I've come to the conclusion
that the best thing to do is to harmonize with upload-pack. Then you
never have the "well, but I could fetch it, so why won't upload-archive
let me get it" argument. Something like:

  1. split name at first colon (like we already do)

  2. make sure the left-hand side is reachable according to the same
     rules that upload-pack uses. Right we just say "is it a ref". It
     should be:

      2a. if it is a commit-ish, is it reachable from a ref?

      2b. otherwise, is it pointed to directly by a ref?

  3. Abort if it's not reachable. Abort if it's not a tree-ish. No
     checks necessary on the right-hand side, because a path lookup in a
     tree-ish is always reachable from the tree-ish. I.e., the same rule
     we have now.

I did not check if upload-pack will respect a "want" line for an object
accessible only by peeling a tag. But an obvious 2c could be "is it
accessible by peeling the refs?"

That leaves the only inaccessible thing as direct-sha1s of trees and
blobs that are reachable from commits. But you also cannot ask for those
directly via upload-pack, and I do not think it's worth it to do the
much more expensive reachability check to verify those (OTOH, it is no
more expensive than the current "counting objects" for a clone, and we
could do it only as a fallback when cheaper checks do not work).

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] archive: let remote clients get reachable commits
  2013-02-22 18:26           ` Jeff King
@ 2013-02-22 19:15             ` Junio C Hamano
  0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2013-02-22 19:15 UTC (permalink / raw)
  To: Jeff King; +Cc: Sergey Sergeev, git@vger.kernel.org

Jeff King <peff@peff.net> writes:

> On Fri, Feb 22, 2013 at 10:06:56AM -0800, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>> 
>> > How are you proposing to verify master~12 in that example? Because
>> > during parsing, it starts with "master", and we remember that?
>> 
>> By not cheating (i.e. using get_sha1()), but making sure you can
>> parse "master" and the adornment on it "~12" is something sane.
>
> So, like these patches:
>
>   http://article.gmane.org/gmane.comp.version-control.git/188386
>
>   http://article.gmane.org/gmane.comp.version-control.git/188387
>
> ? They do not allow arbitrary sha1s that happen to point to branch tips,
> but I am not sure whether that is something people care about or not.
>
>> That is why I said "this is harder than one would naively think, but
>> limiting will make it significantly easier".  I didn't say that it
>> would become "trivial", did I?
>
> I'm not implying it would be trivial. It was an honest question, since
> you did not seem to want to do the pass-more-information-out-of-get-sha1
> approach last time this came up.

That does not match my recollection ($gmane/188427).  In fact, I had
those two patches you quoted earlier in mind when I wrote the "limit
it and it will become easier" response.

> Even though those patches above are from me, I've come to the conclusion
> that the best thing to do is to harmonize with upload-pack. Then you
> never have the "well, but I could fetch it, so why won't upload-archive
> let me get it" argument.

That sounds like a sensible yardstick.

>   1. split name at first colon (like we already do)
>
>   2. make sure the left-hand side is reachable according to the same
>      rules that upload-pack uses.

Well, "upload-pack" under the hood operates on a bare 40-hex.  If
you mean to do "split name at first colon, run get_sha1() on the
LHS" in step 1., I would agree this is a good direction to go.

>      Right we just say "is it a ref". It should be:

With s/should/could optionally/, I would agree.

> That leaves the only inaccessible thing as direct-sha1s of trees and
> blobs that are reachable from commits.

Yes (with s/are reachable/are only reachable/).

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-02-22 19:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-21 14:24 [PATCH] archive: let remote clients get reachable commits Sergey Segeev
2013-02-21 15:52 ` Jeff King
     [not found]   ` <995301361532360@web22h.yandex.ru>
2013-02-22 17:10     ` Junio C Hamano
2013-02-22 17:27       ` Jeff King
2013-02-22 18:06         ` Junio C Hamano
2013-02-22 18:26           ` Jeff King
2013-02-22 19:15             ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).