* [PATCH 0/4] exact ref-matching for fetch-pack @ 2011-12-13 0:39 Jeff King 2011-12-13 0:41 ` [PATCH 1/4] drop "match" parameter from get_remote_heads Jeff King ` (3 more replies) 0 siblings, 4 replies; 8+ messages in thread From: Jeff King @ 2011-12-13 0:39 UTC (permalink / raw) To: git; +Cc: Kevin Sawicki I was passed a bug report of clone failing on a repo with an oddly named ref in it: "refs/for/refs/heads/master". This is probably an error somebody made while pushing and should simply be deleted. However, it's curious that "clone" would fail on this, since it should be ignoring everything outside the "refs/heads/*" refspec. It turns out that during the fetch-pack phase, we accidentally ask for the sha1 of "refs/for/refs/heads/master" instead of that of "refs/heads/master". This can cause "clone" to fail, as we may not have asked for the object pointed to by "refs/heads/master" at all. This behavior is due to some questionable suffix-matching deep inside fetch-pack. The code in question dates back to the beginning of git; I think it simply hasn't triggered much because the refname you need to have is so specific (you must be asking to fetch a ref "refs/X", have "refs/Y/X" on the remote, and "Y" must come alphabetically before "X", since we match refs in alphabetical order). As I said, this is probably just a silly one-off error. But I could imagine this happening legitimately. Here are two possible scenarios: 1. You are mirroring some meta-information about your refs in a parallel namespace. E.g., "refs/descriptions/refs/heads/master" contains information about "refs/heads/master". 2. One of the things we've discussed for future git is mirroring the remote ref namespaces more literally. E.g., having something like "refs/remotes/origin/refs/heads/master". That won't actually trigger this bug because "heads" is alphabetically before "remotes". But "tags", for example, is not. This could possibly be considered a behavior regression for users of the fetch-pack command line. See patches 2 and 3 for details. [1/4]: drop "match" parameter from get_remote_heads [2/4]: t5500: give fully-qualified refs to fetch-pack [3/4]: fetch-pack: match refs exactly [4/4]: connect.c: drop path_match function -Peff ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/4] drop "match" parameter from get_remote_heads 2011-12-13 0:39 [PATCH 0/4] exact ref-matching for fetch-pack Jeff King @ 2011-12-13 0:41 ` Jeff King 2011-12-13 0:44 ` [PATCH 2/4] t5500: give fully-qualified refs to fetch-pack Jeff King ` (2 subsequent siblings) 3 siblings, 0 replies; 8+ messages in thread From: Jeff King @ 2011-12-13 0:41 UTC (permalink / raw) To: git; +Cc: Kevin Sawicki The get_remote_heads function reads the list of remote refs during git protocol session. It dates all the way back to def88e9 (Commit first cut at "git-fetch-pack", 2005-07-04). At that time, the idea was to come up with a list of refs we were interested in, and then filter the list as we got it from the remote side. Later, 1baaae5 (Make maximal use of the remote refs, 2005-10-28) stopped filtering at the get_remote_heads layer, letting us use the non-matching refs to find common history. As a result, all callers now simply pass an empty match list (and any future callers will want to do the same). So let's drop these now-useless parameters. Signed-off-by: Jeff King <peff@peff.net> --- This one isn't necessary for the bugfix, but since it is the only other caller of path_match besides fetch-pack, it gives us freedom to modify or get rid of path_match later. builtin/fetch-pack.c | 2 +- builtin/send-pack.c | 3 +-- cache.h | 2 +- connect.c | 3 --- remote-curl.c | 2 +- transport.c | 7 +++---- 6 files changed, 7 insertions(+), 12 deletions(-) diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c index c6bc8eb..46688dc 100644 --- a/builtin/fetch-pack.c +++ b/builtin/fetch-pack.c @@ -976,7 +976,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix) args.verbose ? CONNECT_VERBOSE : 0); } - get_remote_heads(fd[0], &ref, 0, NULL, 0, NULL); + get_remote_heads(fd[0], &ref, 0, NULL); ref = fetch_pack(&args, fd, conn, ref, dest, nr_heads, heads, pack_lockfile_ptr); diff --git a/builtin/send-pack.c b/builtin/send-pack.c index e0b8030..cd1115f 100644 --- a/builtin/send-pack.c +++ b/builtin/send-pack.c @@ -494,8 +494,7 @@ int cmd_send_pack(int argc, const char **argv, const char *prefix) memset(&extra_have, 0, sizeof(extra_have)); - get_remote_heads(fd[0], &remote_refs, 0, NULL, REF_NORMAL, - &extra_have); + get_remote_heads(fd[0], &remote_refs, REF_NORMAL, &extra_have); transport_verify_remote_names(nr_refspecs, refspecs); diff --git a/cache.h b/cache.h index 8c98d05..408e880 100644 --- a/cache.h +++ b/cache.h @@ -1034,7 +1034,7 @@ struct extra_have_objects { int nr, alloc; unsigned char (*array)[20]; }; -extern struct ref **get_remote_heads(int in, struct ref **list, int nr_match, char **match, unsigned int flags, struct extra_have_objects *); +extern struct ref **get_remote_heads(int in, struct ref **list, unsigned int flags, struct extra_have_objects *); extern int server_supports(const char *feature); extern struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path); diff --git a/connect.c b/connect.c index 51990fa..48df90b 100644 --- a/connect.c +++ b/connect.c @@ -53,7 +53,6 @@ static void add_extra_have(struct extra_have_objects *extra, unsigned char *sha1 * Read all the refs from the other end */ struct ref **get_remote_heads(int in, struct ref **list, - int nr_match, char **match, unsigned int flags, struct extra_have_objects *extra_have) { @@ -92,8 +91,6 @@ struct ref **get_remote_heads(int in, struct ref **list, if (!check_ref(name, name_len, flags)) continue; - if (nr_match && !path_match(name, nr_match, match)) - continue; ref = alloc_ref(buffer + 41); hashcpy(ref->old_sha1, old_sha1); *list = ref; diff --git a/remote-curl.c b/remote-curl.c index 0e720ee..94dc488 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -200,7 +200,7 @@ static struct ref *parse_git_refs(struct discovery *heads) if (start_async(&async)) die("cannot start thread to parse advertised refs"); - get_remote_heads(async.out, &list, 0, NULL, 0, NULL); + get_remote_heads(async.out, &list, 0, NULL); close(async.out); if (finish_async(&async)) die("ref parsing thread failed"); diff --git a/transport.c b/transport.c index 51814b5..c2245d4 100644 --- a/transport.c +++ b/transport.c @@ -502,7 +502,7 @@ static struct ref *get_refs_via_connect(struct transport *transport, int for_pus struct ref *refs; connect_setup(transport, for_push, 0); - get_remote_heads(data->fd[0], &refs, 0, NULL, + get_remote_heads(data->fd[0], &refs, for_push ? REF_NORMAL : 0, &data->extra_have); data->got_remote_heads = 1; @@ -537,7 +537,7 @@ static int fetch_refs_via_pack(struct transport *transport, if (!data->got_remote_heads) { connect_setup(transport, 0, 0); - get_remote_heads(data->fd[0], &refs_tmp, 0, NULL, 0, NULL); + get_remote_heads(data->fd[0], &refs_tmp, 0, NULL); data->got_remote_heads = 1; } @@ -772,8 +772,7 @@ static int git_transport_push(struct transport *transport, struct ref *remote_re struct ref *tmp_refs; connect_setup(transport, 1, 0); - get_remote_heads(data->fd[0], &tmp_refs, 0, NULL, REF_NORMAL, - NULL); + get_remote_heads(data->fd[0], &tmp_refs, REF_NORMAL, NULL); data->got_remote_heads = 1; } -- 1.7.8.13.g74677 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/4] t5500: give fully-qualified refs to fetch-pack 2011-12-13 0:39 [PATCH 0/4] exact ref-matching for fetch-pack Jeff King 2011-12-13 0:41 ` [PATCH 1/4] drop "match" parameter from get_remote_heads Jeff King @ 2011-12-13 0:44 ` Jeff King 2011-12-13 0:48 ` [PATCH 3/4] fetch-pack: match refs exactly Jeff King 2011-12-13 0:49 ` [PATCH 4/4] connect.c: drop path_match function Jeff King 3 siblings, 0 replies; 8+ messages in thread From: Jeff King @ 2011-12-13 0:44 UTC (permalink / raw) To: git; +Cc: Kevin Sawicki The fetch-pack documentation is very clear that refs given on the command line are to be full refs: <refs>...:: The remote heads to update from. This is relative to $GIT_DIR (e.g. "HEAD", "refs/heads/master"). When unspecified, update from all heads the remote side has. and this has been the case since fetch-pack was originally documented in 8b3d9dc ([PATCH] Documentation: clone/fetch/upload., 2005-07-14). Let's follow our own documentation to set a good example, and to avoid breaking when this restriction is enforced in the next patch. Signed-off-by: Jeff King <peff@peff.net> --- This patch by itself isn't a big deal for backwards compatibility But if we are violating the constraints in the documentation, then I worry that others are, too. On the other hand, I seriously doubt anyone is actually using fetch-pack directly anymore at all. t/t5500-fetch-pack.sh | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh index bafcca7..9bf69e9 100755 --- a/t/t5500-fetch-pack.sh +++ b/t/t5500-fetch-pack.sh @@ -97,7 +97,7 @@ test_expect_success 'setup' ' git symbolic-ref HEAD refs/heads/B ' -pull_to_client 1st "B A" $((11*3)) +pull_to_client 1st "refs/heads/B refs/heads/A" $((11*3)) test_expect_success 'post 1st pull setup' ' add A11 $A10 && @@ -110,9 +110,9 @@ test_expect_success 'post 1st pull setup' ' done ' -pull_to_client 2nd "B" $((64*3)) +pull_to_client 2nd "refs/heads/B" $((64*3)) -pull_to_client 3rd "A" $((1*3)) +pull_to_client 3rd "refs/heads/A" $((1*3)) test_expect_success 'clone shallow' ' git clone --depth 2 "file://$(pwd)/." shallow -- 1.7.8.13.g74677 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/4] fetch-pack: match refs exactly 2011-12-13 0:39 [PATCH 0/4] exact ref-matching for fetch-pack Jeff King 2011-12-13 0:41 ` [PATCH 1/4] drop "match" parameter from get_remote_heads Jeff King 2011-12-13 0:44 ` [PATCH 2/4] t5500: give fully-qualified refs to fetch-pack Jeff King @ 2011-12-13 0:48 ` Jeff King 2011-12-13 0:54 ` Jeff King 2011-12-15 21:46 ` Junio C Hamano 2011-12-13 0:49 ` [PATCH 4/4] connect.c: drop path_match function Jeff King 3 siblings, 2 replies; 8+ messages in thread From: Jeff King @ 2011-12-13 0:48 UTC (permalink / raw) To: git; +Cc: Kevin Sawicki When we are determining the list of refs to fetch via fetch-pack, we have two sets of refs to compare: those on the remote side, and a "match" list of things we want to fetch. We iterate through the remote refs alphabetically, seeing if each one is wanted by the "match" list. Since def88e9 (Commit first cut at "git-fetch-pack", 2005-07-04), we have used the "path_match" function to do a suffix match, where a remote ref is considered wanted if any of the "match" elements is a suffix of the remote refname. This enables callers of fetch-pack to specify unqualified refs and have them matched up with remote refs (e.g., ask for "A" and get remote's "refs/heads/A"). However, if you provide a fully qualified ref, then there are corner cases where we provide the wrong answer. For example, given a remote with two refs: refs/foo/refs/heads/master refs/heads/master asking for "refs/heads/master" will first match "refs/foo/refs/heads/master" by the suffix rule, and we will erroneously fetch it instead of refs/heads/master. As it turns out, all callers of fetch_pack do provide fully-qualified refs for the match list. There are two ways fetch_pack can get match lists: 1. Through the transport code (i.e., via git-fetch) 2. On the command-line of git-fetch-pack In the first case, we will always be providing the names of fully-qualified refs from "struct ref" objects. We will have pre-matched those ref objects already (since we have to handle more advanced matching, like wildcard refspecs), and are just providing a list of the refs whose objects we need. In the second case, users could in theory be providing non-qualified refs on the command-line. However, the fetch-pack documentation claims that refs should be fully qualified (and has always done so since it was written in 2005). Let's change this path_match call to simply check for string equality, matching what the callers of fetch_pack are expecting. Signed-off-by: Jeff King <peff@peff.net> --- This is obviously the one that can break existing fetch-pack users. I doubt they exist. If they do, there are a few alternatives: 1. Come up with some more sane rules for path_match (e.g., try full strings first, use full-string matching for things starting with "refs/", etc). 2. Leave the matching in-place for git-fetch-pack, but use exact matching for internal users that will always provide qualified refs (i.e., "git fetch"). I don't personally think it's worth the trouble. builtin/fetch-pack.c | 13 +++++++++---- t/t5527-fetch-odd-refs.sh | 29 +++++++++++++++++++++++++++++ 2 files changed, 38 insertions(+), 4 deletions(-) create mode 100755 t/t5527-fetch-odd-refs.sh diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c index 46688dc..6207ecd 100644 --- a/builtin/fetch-pack.c +++ b/builtin/fetch-pack.c @@ -556,11 +556,16 @@ static void filter_refs(struct ref **refs, int nr_match, char **match) continue; } else { - int order = path_match(ref->name, nr_match, match); - if (order) { - return_refs[order-1] = ref; - continue; /* we will link it later */ + int i; + for (i = 0; i < nr_match; i++) { + if (!strcmp(ref->name, match[i])) { + match[i][0] = '\0'; + return_refs[i] = ref; + break; + } } + if (i < nr_match) + continue; /* we will link it later */ } free(ref); } diff --git a/t/t5527-fetch-odd-refs.sh b/t/t5527-fetch-odd-refs.sh new file mode 100755 index 0000000..edea9f9 --- /dev/null +++ b/t/t5527-fetch-odd-refs.sh @@ -0,0 +1,29 @@ +#!/bin/sh + +test_description='test fetching of oddly-named refs' +. ./test-lib.sh + +# afterwards we will have: +# HEAD - two +# refs/for/refs/heads/master - one +# refs/heads/master - three +test_expect_success 'setup repo with odd suffix ref' ' + echo content >file && + git add . && + git commit -m one && + git update-ref refs/for/refs/heads/master HEAD && + echo content >>file && + git commit -a -m two && + echo content >>file && + git commit -a -m three && + git checkout HEAD^ +' + +test_expect_success 'suffix ref is ignored during fetch' ' + git clone --bare file://"$PWD" suffix && + echo three >expect && + git --git-dir=suffix log -1 --format=%s refs/heads/master >actual && + test_cmp expect actual +' + +test_done -- 1.7.8.13.g74677 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 3/4] fetch-pack: match refs exactly 2011-12-13 0:48 ` [PATCH 3/4] fetch-pack: match refs exactly Jeff King @ 2011-12-13 0:54 ` Jeff King 2011-12-15 21:46 ` Junio C Hamano 1 sibling, 0 replies; 8+ messages in thread From: Jeff King @ 2011-12-13 0:54 UTC (permalink / raw) To: git; +Cc: Kevin Sawicki On Mon, Dec 12, 2011 at 07:48:08PM -0500, Jeff King wrote: > diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c > index 46688dc..6207ecd 100644 > --- a/builtin/fetch-pack.c > +++ b/builtin/fetch-pack.c > @@ -556,11 +556,16 @@ static void filter_refs(struct ref **refs, int nr_match, char **match) > continue; > } > else { > - int order = path_match(ref->name, nr_match, match); > - if (order) { > - return_refs[order-1] = ref; > - continue; /* we will link it later */ > + int i; > + for (i = 0; i < nr_match; i++) { > + if (!strcmp(ref->name, match[i])) { > + match[i][0] = '\0'; > + return_refs[i] = ref; > + break; > + } > } > + if (i < nr_match) > + continue; /* we will link it later */ Astute readers may notice that our matching scheme is now something like: for ref in refs for match in matches strcmp(ref, match) If you are fetching everything, that's O(n^2) strcmps (where n is the number of remote refs). If we sorted the match list (the refs list is already sorted), we could turn it into an O(n lg n) sort+merge. This is an optimization we couldn't do with the old suffix-matching rule. I doubt it matters in any practical case. Even for our crazy 100K ref cases at GitHub, we don't tend to actually fetch all of those at one time. So it is more like O(n * m), where m is probably in the dozens at most. -Peff ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 3/4] fetch-pack: match refs exactly 2011-12-13 0:48 ` [PATCH 3/4] fetch-pack: match refs exactly Jeff King 2011-12-13 0:54 ` Jeff King @ 2011-12-15 21:46 ` Junio C Hamano 1 sibling, 0 replies; 8+ messages in thread From: Junio C Hamano @ 2011-12-15 21:46 UTC (permalink / raw) To: Jeff King; +Cc: git, Kevin Sawicki Jeff King <peff@peff.net> writes: > This is obviously the one that can break existing fetch-pack users. I > doubt they exist. If they do, there are a few alternatives: > > 1. Come up with some more sane rules for path_match (e.g., try full > strings first, use full-string matching for things starting with > "refs/", etc). > > 2. Leave the matching in-place for git-fetch-pack, but use exact > matching for internal users that will always provide qualified refs > (i.e., "git fetch"). I think the latter is the sane thing to do _if_ this becomes an issue, and as you mention, it is in line with what "git fetch" wrapper already does. Given that fetch-pack was meant to be driven by "git fetch" wrapper that turns the command line and other refspecs into full refnames on the remote end before calling it, and also as you mentionied that it is clearly documented as "relative to $GIT_DIR", I do not think we should support the tail-match semantics at all in the first place. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 4/4] connect.c: drop path_match function 2011-12-13 0:39 [PATCH 0/4] exact ref-matching for fetch-pack Jeff King ` (2 preceding siblings ...) 2011-12-13 0:48 ` [PATCH 3/4] fetch-pack: match refs exactly Jeff King @ 2011-12-13 0:49 ` Jeff King 2011-12-13 3:23 ` Michael Haggerty 3 siblings, 1 reply; 8+ messages in thread From: Jeff King @ 2011-12-13 0:49 UTC (permalink / raw) To: git; +Cc: Kevin Sawicki This function was used for comparing local and remote ref names during fetch (which makes it a candidate for "most confusingly named function of the year"). It no longer has any callers, so let's get rid of it. Signed-off-by: Jeff King <peff@peff.net> --- I just did the exact-string match inline in the previous patch. I could also have modified path_match to do it. But really, I can't think of a worse name for a global function in a system which is all about managing content in paths. Unless, you know, it actually matched paths. Which it doesn't. cache.h | 1 - connect.c | 21 --------------------- 2 files changed, 0 insertions(+), 22 deletions(-) diff --git a/cache.h b/cache.h index 408e880..2ad063f 100644 --- a/cache.h +++ b/cache.h @@ -1029,7 +1029,6 @@ struct ref { extern struct child_process *git_connect(int fd[2], const char *url, const char *prog, int flags); extern int finish_connect(struct child_process *conn); extern int git_connection_is_socket(struct child_process *conn); -extern int path_match(const char *path, int nr, char **match); struct extra_have_objects { int nr, alloc; unsigned char (*array)[20]; diff --git a/connect.c b/connect.c index 48df90b..2a0a040 100644 --- a/connect.c +++ b/connect.c @@ -105,27 +105,6 @@ int server_supports(const char *feature) strstr(server_capabilities, feature) != NULL; } -int path_match(const char *path, int nr, char **match) -{ - int i; - int pathlen = strlen(path); - - for (i = 0; i < nr; i++) { - char *s = match[i]; - int len = strlen(s); - - if (!len || len > pathlen) - continue; - if (memcmp(path + pathlen - len, s, len)) - continue; - if (pathlen > len && path[pathlen - len - 1] != '/') - continue; - *s = 0; - return (i + 1); - } - return 0; -} - enum protocol { PROTO_LOCAL = 1, PROTO_SSH, -- 1.7.8.13.g74677 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 4/4] connect.c: drop path_match function 2011-12-13 0:49 ` [PATCH 4/4] connect.c: drop path_match function Jeff King @ 2011-12-13 3:23 ` Michael Haggerty 0 siblings, 0 replies; 8+ messages in thread From: Michael Haggerty @ 2011-12-13 3:23 UTC (permalink / raw) To: Jeff King; +Cc: git, Kevin Sawicki On 12/13/2011 01:49 AM, Jeff King wrote: > I just did the exact-string match inline in the previous patch. I could > also have modified path_match to do it. But really, I can't think of a > worse name for a global function in a system which is all about > managing content in paths. Unless, you know, it actually matched paths. > Which it doesn't. The tacit assumption that a reference is always a file is all over the code and is often confusing. My [PATCH v2 02/51] refs: rename "refname" variables is a step towards making the distinction more explicit, at least in the refs code. Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-12-15 21:46 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-12-13 0:39 [PATCH 0/4] exact ref-matching for fetch-pack Jeff King 2011-12-13 0:41 ` [PATCH 1/4] drop "match" parameter from get_remote_heads Jeff King 2011-12-13 0:44 ` [PATCH 2/4] t5500: give fully-qualified refs to fetch-pack Jeff King 2011-12-13 0:48 ` [PATCH 3/4] fetch-pack: match refs exactly Jeff King 2011-12-13 0:54 ` Jeff King 2011-12-15 21:46 ` Junio C Hamano 2011-12-13 0:49 ` [PATCH 4/4] connect.c: drop path_match function Jeff King 2011-12-13 3:23 ` Michael Haggerty
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).