* [PATCH 0/4] exact ref-matching for fetch-pack
@ 2011-12-13 0:39 Jeff King
2011-12-13 0:41 ` [PATCH 1/4] drop "match" parameter from get_remote_heads Jeff King
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Jeff King @ 2011-12-13 0:39 UTC (permalink / raw)
To: git; +Cc: Kevin Sawicki
I was passed a bug report of clone failing on a repo with an oddly named
ref in it: "refs/for/refs/heads/master". This is probably an error
somebody made while pushing and should simply be deleted. However, it's
curious that "clone" would fail on this, since it should be ignoring
everything outside the "refs/heads/*" refspec.
It turns out that during the fetch-pack phase, we accidentally ask for
the sha1 of "refs/for/refs/heads/master" instead of that of
"refs/heads/master". This can cause "clone" to fail, as we may not have
asked for the object pointed to by "refs/heads/master" at all. This
behavior is due to some questionable suffix-matching deep inside
fetch-pack. The code in question dates back to the beginning of git; I
think it simply hasn't triggered much because the refname you need to
have is so specific (you must be asking to fetch a ref "refs/X", have
"refs/Y/X" on the remote, and "Y" must come alphabetically before "X",
since we match refs in alphabetical order).
As I said, this is probably just a silly one-off error. But I could
imagine this happening legitimately. Here are two possible scenarios:
1. You are mirroring some meta-information about your refs in a
parallel namespace. E.g., "refs/descriptions/refs/heads/master"
contains information about "refs/heads/master".
2. One of the things we've discussed for future git is mirroring the
remote ref namespaces more literally. E.g., having something like
"refs/remotes/origin/refs/heads/master". That won't actually
trigger this bug because "heads" is alphabetically before
"remotes". But "tags", for example, is not.
This could possibly be considered a behavior regression for users of the
fetch-pack command line. See patches 2 and 3 for details.
[1/4]: drop "match" parameter from get_remote_heads
[2/4]: t5500: give fully-qualified refs to fetch-pack
[3/4]: fetch-pack: match refs exactly
[4/4]: connect.c: drop path_match function
-Peff
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/4] drop "match" parameter from get_remote_heads
2011-12-13 0:39 [PATCH 0/4] exact ref-matching for fetch-pack Jeff King
@ 2011-12-13 0:41 ` Jeff King
2011-12-13 0:44 ` [PATCH 2/4] t5500: give fully-qualified refs to fetch-pack Jeff King
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Jeff King @ 2011-12-13 0:41 UTC (permalink / raw)
To: git; +Cc: Kevin Sawicki
The get_remote_heads function reads the list of remote refs
during git protocol session. It dates all the way back to
def88e9 (Commit first cut at "git-fetch-pack", 2005-07-04).
At that time, the idea was to come up with a list of refs we
were interested in, and then filter the list as we got it
from the remote side.
Later, 1baaae5 (Make maximal use of the remote refs,
2005-10-28) stopped filtering at the get_remote_heads layer,
letting us use the non-matching refs to find common history.
As a result, all callers now simply pass an empty match
list (and any future callers will want to do the same). So
let's drop these now-useless parameters.
Signed-off-by: Jeff King <peff@peff.net>
---
This one isn't necessary for the bugfix, but since it is the only other
caller of path_match besides fetch-pack, it gives us freedom to modify
or get rid of path_match later.
builtin/fetch-pack.c | 2 +-
builtin/send-pack.c | 3 +--
cache.h | 2 +-
connect.c | 3 ---
remote-curl.c | 2 +-
transport.c | 7 +++----
6 files changed, 7 insertions(+), 12 deletions(-)
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index c6bc8eb..46688dc 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -976,7 +976,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
args.verbose ? CONNECT_VERBOSE : 0);
}
- get_remote_heads(fd[0], &ref, 0, NULL, 0, NULL);
+ get_remote_heads(fd[0], &ref, 0, NULL);
ref = fetch_pack(&args, fd, conn, ref, dest,
nr_heads, heads, pack_lockfile_ptr);
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index e0b8030..cd1115f 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -494,8 +494,7 @@ int cmd_send_pack(int argc, const char **argv, const char *prefix)
memset(&extra_have, 0, sizeof(extra_have));
- get_remote_heads(fd[0], &remote_refs, 0, NULL, REF_NORMAL,
- &extra_have);
+ get_remote_heads(fd[0], &remote_refs, REF_NORMAL, &extra_have);
transport_verify_remote_names(nr_refspecs, refspecs);
diff --git a/cache.h b/cache.h
index 8c98d05..408e880 100644
--- a/cache.h
+++ b/cache.h
@@ -1034,7 +1034,7 @@ struct extra_have_objects {
int nr, alloc;
unsigned char (*array)[20];
};
-extern struct ref **get_remote_heads(int in, struct ref **list, int nr_match, char **match, unsigned int flags, struct extra_have_objects *);
+extern struct ref **get_remote_heads(int in, struct ref **list, unsigned int flags, struct extra_have_objects *);
extern int server_supports(const char *feature);
extern struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path);
diff --git a/connect.c b/connect.c
index 51990fa..48df90b 100644
--- a/connect.c
+++ b/connect.c
@@ -53,7 +53,6 @@ static void add_extra_have(struct extra_have_objects *extra, unsigned char *sha1
* Read all the refs from the other end
*/
struct ref **get_remote_heads(int in, struct ref **list,
- int nr_match, char **match,
unsigned int flags,
struct extra_have_objects *extra_have)
{
@@ -92,8 +91,6 @@ struct ref **get_remote_heads(int in, struct ref **list,
if (!check_ref(name, name_len, flags))
continue;
- if (nr_match && !path_match(name, nr_match, match))
- continue;
ref = alloc_ref(buffer + 41);
hashcpy(ref->old_sha1, old_sha1);
*list = ref;
diff --git a/remote-curl.c b/remote-curl.c
index 0e720ee..94dc488 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -200,7 +200,7 @@ static struct ref *parse_git_refs(struct discovery *heads)
if (start_async(&async))
die("cannot start thread to parse advertised refs");
- get_remote_heads(async.out, &list, 0, NULL, 0, NULL);
+ get_remote_heads(async.out, &list, 0, NULL);
close(async.out);
if (finish_async(&async))
die("ref parsing thread failed");
diff --git a/transport.c b/transport.c
index 51814b5..c2245d4 100644
--- a/transport.c
+++ b/transport.c
@@ -502,7 +502,7 @@ static struct ref *get_refs_via_connect(struct transport *transport, int for_pus
struct ref *refs;
connect_setup(transport, for_push, 0);
- get_remote_heads(data->fd[0], &refs, 0, NULL,
+ get_remote_heads(data->fd[0], &refs,
for_push ? REF_NORMAL : 0, &data->extra_have);
data->got_remote_heads = 1;
@@ -537,7 +537,7 @@ static int fetch_refs_via_pack(struct transport *transport,
if (!data->got_remote_heads) {
connect_setup(transport, 0, 0);
- get_remote_heads(data->fd[0], &refs_tmp, 0, NULL, 0, NULL);
+ get_remote_heads(data->fd[0], &refs_tmp, 0, NULL);
data->got_remote_heads = 1;
}
@@ -772,8 +772,7 @@ static int git_transport_push(struct transport *transport, struct ref *remote_re
struct ref *tmp_refs;
connect_setup(transport, 1, 0);
- get_remote_heads(data->fd[0], &tmp_refs, 0, NULL, REF_NORMAL,
- NULL);
+ get_remote_heads(data->fd[0], &tmp_refs, REF_NORMAL, NULL);
data->got_remote_heads = 1;
}
--
1.7.8.13.g74677
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/4] t5500: give fully-qualified refs to fetch-pack
2011-12-13 0:39 [PATCH 0/4] exact ref-matching for fetch-pack Jeff King
2011-12-13 0:41 ` [PATCH 1/4] drop "match" parameter from get_remote_heads Jeff King
@ 2011-12-13 0:44 ` Jeff King
2011-12-13 0:48 ` [PATCH 3/4] fetch-pack: match refs exactly Jeff King
2011-12-13 0:49 ` [PATCH 4/4] connect.c: drop path_match function Jeff King
3 siblings, 0 replies; 8+ messages in thread
From: Jeff King @ 2011-12-13 0:44 UTC (permalink / raw)
To: git; +Cc: Kevin Sawicki
The fetch-pack documentation is very clear that refs given
on the command line are to be full refs:
<refs>...::
The remote heads to update from. This is relative to
$GIT_DIR (e.g. "HEAD", "refs/heads/master"). When
unspecified, update from all heads the remote side has.
and this has been the case since fetch-pack was originally documented in
8b3d9dc ([PATCH] Documentation: clone/fetch/upload., 2005-07-14).
Let's follow our own documentation to set a good example,
and to avoid breaking when this restriction is enforced in
the next patch.
Signed-off-by: Jeff King <peff@peff.net>
---
This patch by itself isn't a big deal for backwards compatibility But
if we are violating the constraints in the documentation, then I worry
that others are, too. On the other hand, I seriously doubt anyone is
actually using fetch-pack directly anymore at all.
t/t5500-fetch-pack.sh | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index bafcca7..9bf69e9 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -97,7 +97,7 @@ test_expect_success 'setup' '
git symbolic-ref HEAD refs/heads/B
'
-pull_to_client 1st "B A" $((11*3))
+pull_to_client 1st "refs/heads/B refs/heads/A" $((11*3))
test_expect_success 'post 1st pull setup' '
add A11 $A10 &&
@@ -110,9 +110,9 @@ test_expect_success 'post 1st pull setup' '
done
'
-pull_to_client 2nd "B" $((64*3))
+pull_to_client 2nd "refs/heads/B" $((64*3))
-pull_to_client 3rd "A" $((1*3))
+pull_to_client 3rd "refs/heads/A" $((1*3))
test_expect_success 'clone shallow' '
git clone --depth 2 "file://$(pwd)/." shallow
--
1.7.8.13.g74677
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/4] fetch-pack: match refs exactly
2011-12-13 0:39 [PATCH 0/4] exact ref-matching for fetch-pack Jeff King
2011-12-13 0:41 ` [PATCH 1/4] drop "match" parameter from get_remote_heads Jeff King
2011-12-13 0:44 ` [PATCH 2/4] t5500: give fully-qualified refs to fetch-pack Jeff King
@ 2011-12-13 0:48 ` Jeff King
2011-12-13 0:54 ` Jeff King
2011-12-15 21:46 ` Junio C Hamano
2011-12-13 0:49 ` [PATCH 4/4] connect.c: drop path_match function Jeff King
3 siblings, 2 replies; 8+ messages in thread
From: Jeff King @ 2011-12-13 0:48 UTC (permalink / raw)
To: git; +Cc: Kevin Sawicki
When we are determining the list of refs to fetch via
fetch-pack, we have two sets of refs to compare: those on
the remote side, and a "match" list of things we want to
fetch. We iterate through the remote refs alphabetically,
seeing if each one is wanted by the "match" list.
Since def88e9 (Commit first cut at "git-fetch-pack",
2005-07-04), we have used the "path_match" function to do a
suffix match, where a remote ref is considered wanted if
any of the "match" elements is a suffix of the remote
refname.
This enables callers of fetch-pack to specify unqualified
refs and have them matched up with remote refs (e.g., ask
for "A" and get remote's "refs/heads/A"). However, if you
provide a fully qualified ref, then there are corner cases
where we provide the wrong answer. For example, given a
remote with two refs:
refs/foo/refs/heads/master
refs/heads/master
asking for "refs/heads/master" will first match
"refs/foo/refs/heads/master" by the suffix rule, and we will
erroneously fetch it instead of refs/heads/master.
As it turns out, all callers of fetch_pack do provide
fully-qualified refs for the match list. There are two ways
fetch_pack can get match lists:
1. Through the transport code (i.e., via git-fetch)
2. On the command-line of git-fetch-pack
In the first case, we will always be providing the names of
fully-qualified refs from "struct ref" objects. We will have
pre-matched those ref objects already (since we have to
handle more advanced matching, like wildcard refspecs), and
are just providing a list of the refs whose objects we need.
In the second case, users could in theory be providing
non-qualified refs on the command-line. However, the
fetch-pack documentation claims that refs should be fully
qualified (and has always done so since it was written in
2005).
Let's change this path_match call to simply check for string
equality, matching what the callers of fetch_pack are
expecting.
Signed-off-by: Jeff King <peff@peff.net>
---
This is obviously the one that can break existing fetch-pack users. I
doubt they exist. If they do, there are a few alternatives:
1. Come up with some more sane rules for path_match (e.g., try full
strings first, use full-string matching for things starting with
"refs/", etc).
2. Leave the matching in-place for git-fetch-pack, but use exact
matching for internal users that will always provide qualified refs
(i.e., "git fetch").
I don't personally think it's worth the trouble.
builtin/fetch-pack.c | 13 +++++++++----
t/t5527-fetch-odd-refs.sh | 29 +++++++++++++++++++++++++++++
2 files changed, 38 insertions(+), 4 deletions(-)
create mode 100755 t/t5527-fetch-odd-refs.sh
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 46688dc..6207ecd 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -556,11 +556,16 @@ static void filter_refs(struct ref **refs, int nr_match, char **match)
continue;
}
else {
- int order = path_match(ref->name, nr_match, match);
- if (order) {
- return_refs[order-1] = ref;
- continue; /* we will link it later */
+ int i;
+ for (i = 0; i < nr_match; i++) {
+ if (!strcmp(ref->name, match[i])) {
+ match[i][0] = '\0';
+ return_refs[i] = ref;
+ break;
+ }
}
+ if (i < nr_match)
+ continue; /* we will link it later */
}
free(ref);
}
diff --git a/t/t5527-fetch-odd-refs.sh b/t/t5527-fetch-odd-refs.sh
new file mode 100755
index 0000000..edea9f9
--- /dev/null
+++ b/t/t5527-fetch-odd-refs.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+test_description='test fetching of oddly-named refs'
+. ./test-lib.sh
+
+# afterwards we will have:
+# HEAD - two
+# refs/for/refs/heads/master - one
+# refs/heads/master - three
+test_expect_success 'setup repo with odd suffix ref' '
+ echo content >file &&
+ git add . &&
+ git commit -m one &&
+ git update-ref refs/for/refs/heads/master HEAD &&
+ echo content >>file &&
+ git commit -a -m two &&
+ echo content >>file &&
+ git commit -a -m three &&
+ git checkout HEAD^
+'
+
+test_expect_success 'suffix ref is ignored during fetch' '
+ git clone --bare file://"$PWD" suffix &&
+ echo three >expect &&
+ git --git-dir=suffix log -1 --format=%s refs/heads/master >actual &&
+ test_cmp expect actual
+'
+
+test_done
--
1.7.8.13.g74677
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 4/4] connect.c: drop path_match function
2011-12-13 0:39 [PATCH 0/4] exact ref-matching for fetch-pack Jeff King
` (2 preceding siblings ...)
2011-12-13 0:48 ` [PATCH 3/4] fetch-pack: match refs exactly Jeff King
@ 2011-12-13 0:49 ` Jeff King
2011-12-13 3:23 ` Michael Haggerty
3 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2011-12-13 0:49 UTC (permalink / raw)
To: git; +Cc: Kevin Sawicki
This function was used for comparing local and remote ref
names during fetch (which makes it a candidate for "most
confusingly named function of the year").
It no longer has any callers, so let's get rid of it.
Signed-off-by: Jeff King <peff@peff.net>
---
I just did the exact-string match inline in the previous patch. I could
also have modified path_match to do it. But really, I can't think of a
worse name for a global function in a system which is all about
managing content in paths. Unless, you know, it actually matched paths.
Which it doesn't.
cache.h | 1 -
connect.c | 21 ---------------------
2 files changed, 0 insertions(+), 22 deletions(-)
diff --git a/cache.h b/cache.h
index 408e880..2ad063f 100644
--- a/cache.h
+++ b/cache.h
@@ -1029,7 +1029,6 @@ struct ref {
extern struct child_process *git_connect(int fd[2], const char *url, const char *prog, int flags);
extern int finish_connect(struct child_process *conn);
extern int git_connection_is_socket(struct child_process *conn);
-extern int path_match(const char *path, int nr, char **match);
struct extra_have_objects {
int nr, alloc;
unsigned char (*array)[20];
diff --git a/connect.c b/connect.c
index 48df90b..2a0a040 100644
--- a/connect.c
+++ b/connect.c
@@ -105,27 +105,6 @@ int server_supports(const char *feature)
strstr(server_capabilities, feature) != NULL;
}
-int path_match(const char *path, int nr, char **match)
-{
- int i;
- int pathlen = strlen(path);
-
- for (i = 0; i < nr; i++) {
- char *s = match[i];
- int len = strlen(s);
-
- if (!len || len > pathlen)
- continue;
- if (memcmp(path + pathlen - len, s, len))
- continue;
- if (pathlen > len && path[pathlen - len - 1] != '/')
- continue;
- *s = 0;
- return (i + 1);
- }
- return 0;
-}
-
enum protocol {
PROTO_LOCAL = 1,
PROTO_SSH,
--
1.7.8.13.g74677
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 3/4] fetch-pack: match refs exactly
2011-12-13 0:48 ` [PATCH 3/4] fetch-pack: match refs exactly Jeff King
@ 2011-12-13 0:54 ` Jeff King
2011-12-15 21:46 ` Junio C Hamano
1 sibling, 0 replies; 8+ messages in thread
From: Jeff King @ 2011-12-13 0:54 UTC (permalink / raw)
To: git; +Cc: Kevin Sawicki
On Mon, Dec 12, 2011 at 07:48:08PM -0500, Jeff King wrote:
> diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
> index 46688dc..6207ecd 100644
> --- a/builtin/fetch-pack.c
> +++ b/builtin/fetch-pack.c
> @@ -556,11 +556,16 @@ static void filter_refs(struct ref **refs, int nr_match, char **match)
> continue;
> }
> else {
> - int order = path_match(ref->name, nr_match, match);
> - if (order) {
> - return_refs[order-1] = ref;
> - continue; /* we will link it later */
> + int i;
> + for (i = 0; i < nr_match; i++) {
> + if (!strcmp(ref->name, match[i])) {
> + match[i][0] = '\0';
> + return_refs[i] = ref;
> + break;
> + }
> }
> + if (i < nr_match)
> + continue; /* we will link it later */
Astute readers may notice that our matching scheme is now something
like:
for ref in refs
for match in matches
strcmp(ref, match)
If you are fetching everything, that's O(n^2) strcmps (where n is the
number of remote refs). If we sorted the match list (the refs list is
already sorted), we could turn it into an O(n lg n) sort+merge. This is
an optimization we couldn't do with the old suffix-matching rule.
I doubt it matters in any practical case. Even for our crazy 100K ref
cases at GitHub, we don't tend to actually fetch all of those at one
time. So it is more like O(n * m), where m is probably in the dozens at
most.
-Peff
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 4/4] connect.c: drop path_match function
2011-12-13 0:49 ` [PATCH 4/4] connect.c: drop path_match function Jeff King
@ 2011-12-13 3:23 ` Michael Haggerty
0 siblings, 0 replies; 8+ messages in thread
From: Michael Haggerty @ 2011-12-13 3:23 UTC (permalink / raw)
To: Jeff King; +Cc: git, Kevin Sawicki
On 12/13/2011 01:49 AM, Jeff King wrote:
> I just did the exact-string match inline in the previous patch. I could
> also have modified path_match to do it. But really, I can't think of a
> worse name for a global function in a system which is all about
> managing content in paths. Unless, you know, it actually matched paths.
> Which it doesn't.
The tacit assumption that a reference is always a file is all over the
code and is often confusing. My
[PATCH v2 02/51] refs: rename "refname" variables
is a step towards making the distinction more explicit, at least in the
refs code.
Michael
--
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 3/4] fetch-pack: match refs exactly
2011-12-13 0:48 ` [PATCH 3/4] fetch-pack: match refs exactly Jeff King
2011-12-13 0:54 ` Jeff King
@ 2011-12-15 21:46 ` Junio C Hamano
1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2011-12-15 21:46 UTC (permalink / raw)
To: Jeff King; +Cc: git, Kevin Sawicki
Jeff King <peff@peff.net> writes:
> This is obviously the one that can break existing fetch-pack users. I
> doubt they exist. If they do, there are a few alternatives:
>
> 1. Come up with some more sane rules for path_match (e.g., try full
> strings first, use full-string matching for things starting with
> "refs/", etc).
>
> 2. Leave the matching in-place for git-fetch-pack, but use exact
> matching for internal users that will always provide qualified refs
> (i.e., "git fetch").
I think the latter is the sane thing to do _if_ this becomes an issue, and
as you mention, it is in line with what "git fetch" wrapper already
does.
Given that fetch-pack was meant to be driven by "git fetch" wrapper that
turns the command line and other refspecs into full refnames on the remote
end before calling it, and also as you mentionied that it is clearly
documented as "relative to $GIT_DIR", I do not think we should support the
tail-match semantics at all in the first place.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-12-15 21:46 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-13 0:39 [PATCH 0/4] exact ref-matching for fetch-pack Jeff King
2011-12-13 0:41 ` [PATCH 1/4] drop "match" parameter from get_remote_heads Jeff King
2011-12-13 0:44 ` [PATCH 2/4] t5500: give fully-qualified refs to fetch-pack Jeff King
2011-12-13 0:48 ` [PATCH 3/4] fetch-pack: match refs exactly Jeff King
2011-12-13 0:54 ` Jeff King
2011-12-15 21:46 ` Junio C Hamano
2011-12-13 0:49 ` [PATCH 4/4] connect.c: drop path_match function Jeff King
2011-12-13 3:23 ` Michael Haggerty
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).