* Bug in fetch-pack.c, please confirm
@ 2015-03-15 6:37 Kyle J. McKay
2015-03-15 7:27 ` Junio C Hamano
2015-03-16 1:13 ` Jeff King
0 siblings, 2 replies; 14+ messages in thread
From: Kyle J. McKay @ 2015-03-15 6:37 UTC (permalink / raw)
To: Jeff King, Junio C Hamano; +Cc: Git mailing list
Hi guys,
So I was looking at fetch-pack.c (from master @ 52cae643, but I think
it's the same everywhere):
# Lines 626-648
for (retval = 1, ref = *refs; ref ; ref = ref->next) {
const unsigned char *remote = ref->old_sha1;
unsigned char local[20];
struct object *o;
o = lookup_object(remote);
if (!o || !(o->flags & COMPLETE)) {
retval = 0;
if (!args->verbose)
continue;
fprintf(stderr,
"want %s (%s)\n", sha1_to_hex(remote),
ref->name);
continue;
}
hashcpy(ref->new_sha1, local);
if (!args->verbose)
continue;
fprintf(stderr,
"already have %s (%s)\n", sha1_to_hex(remote),
ref->name);
}
Peff, weren't you having some issue with want and have and hide refs?
Tell me please how the "local" variable above gets initialized?
It's declared on the stack inside the for loop scope so only
guaranteed to have garbage.
It's passed to hashcpy which has this prototype:
inline void hashcpy(unsigned char *sha_dst, const unsigned char *sha_src);
So it looks to me like garbage is copied into rev->new_sha1, yes?
Something's very wrong here.
It looks to me like the bug was introduced in 49bb805e (Do not ask for
objects known to be complete. 2005-10-19).
I've taken a stab a a fix below.
-Kyle
-- 8< --
Subject: [PATCH] fetch-pack.c: do not use uninitialized sha1 value
Since 49bb805e (Do not ask for objects known to be complete. 2005-10-19)
when the read_ref call was replaced with a parse_object call, the
automatic variable 'local' containing an sha1 has been left uninitialized.
Subsequently in 1baaae5e (Make maximal use of the remote refs, 2005-10-28)
the parse_object call was replaced with a lookup_object call but still
the 'local' variable was left uninitialized.
However, it's used as the source to update another sha1 value in the case
that we already have it and in that case the other ref will end up with
whatever garbage was in the 'local' variable.
Fix this by removing the 'local' variable and using the value from the
result of the lookup_object call instead.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
---
fetch-pack.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index 655ee642..c0b4d84f 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -621,29 +621,28 @@ static int everything_local(struct fetch_pack_args *args,
}
}
filter_refs(args, refs, sought, nr_sought);
for (retval = 1, ref = *refs; ref ; ref = ref->next) {
const unsigned char *remote = ref->old_sha1;
- unsigned char local[20];
struct object *o;
o = lookup_object(remote);
if (!o || !(o->flags & COMPLETE)) {
retval = 0;
if (!args->verbose)
continue;
fprintf(stderr,
"want %s (%s)\n", sha1_to_hex(remote),
ref->name);
continue;
}
- hashcpy(ref->new_sha1, local);
+ hashcpy(ref->new_sha1, o->sha1);
if (!args->verbose)
continue;
fprintf(stderr,
"already have %s (%s)\n", sha1_to_hex(remote),
ref->name);
}
return retval;
---
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: Bug in fetch-pack.c, please confirm
2015-03-15 6:37 Bug in fetch-pack.c, please confirm Kyle J. McKay
@ 2015-03-15 7:27 ` Junio C Hamano
2015-03-15 7:30 ` Junio C Hamano
2015-03-16 1:13 ` Jeff King
1 sibling, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2015-03-15 7:27 UTC (permalink / raw)
To: Kyle J. McKay; +Cc: Jeff King, Git mailing list
"Kyle J. McKay" <mackyle@gmail.com> writes:
> Hi guys,
>
> So I was looking at fetch-pack.c (from master @ 52cae643, but I think
> it's the same everywhere):
>
> # Lines 626-648
>
49bb805e (Do not ask for objects known to be complete., 2005-10-19)
seems to lose the assignment to local[].
> Something's very wrong here.
>
> It looks to me like the bug was introduced in 49bb805e (Do not ask for
> objects known to be complete. 2005-10-19).
I read that "lines 626-648", stopped reading your message and did a
blame session myself, and arrived at the same culprit.
The very original code that read from the ref directly here; I do
not think it was an attempt to catch a race condition where
ref->old_sha1 got stale and the ref on the filesystem has a newer
value. Hence, I think copying from o->sha1 like you did is a fix
that is the most faithful to the original code.
o is lookup_object(remote) which is lookup_object(ref->old_sha1);
that makes o->sha1 always be ref->old_sha1, so either would be fine,
but o->sha1 would be more appropriate in this codeflow. We grab o,
if it is NULL or if we do not like it, we do something and continue,
otherwise we like that o so grabbing o->sha1 out of it makes the
logic look flowing right, at least to me ;-)
> -- 8< --
> Subject: [PATCH] fetch-pack.c: do not use uninitialized sha1 value
>
> Since 49bb805e (Do not ask for objects known to be complete. 2005-10-19)
> when the read_ref call was replaced with a parse_object call, the
> automatic variable 'local' containing an sha1 has been left uninitialized.
>
> Subsequently in 1baaae5e (Make maximal use of the remote refs, 2005-10-28)
> the parse_object call was replaced with a lookup_object call but still
> the 'local' variable was left uninitialized.
>
> However, it's used as the source to update another sha1 value in the case
> that we already have it and in that case the other ref will end up with
> whatever garbage was in the 'local' variable.
>
> Fix this by removing the 'local' variable and using the value from the
> result of the lookup_object call instead.
>
> Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
> ---
> fetch-pack.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 655ee642..c0b4d84f 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -621,29 +621,28 @@ static int everything_local(struct fetch_pack_args *args,
> }
> }
>
> filter_refs(args, refs, sought, nr_sought);
>
> for (retval = 1, ref = *refs; ref ; ref = ref->next) {
> const unsigned char *remote = ref->old_sha1;
> - unsigned char local[20];
> struct object *o;
>
> o = lookup_object(remote);
> if (!o || !(o->flags & COMPLETE)) {
> retval = 0;
> if (!args->verbose)
> continue;
> fprintf(stderr,
> "want %s (%s)\n", sha1_to_hex(remote),
> ref->name);
> continue;
> }
>
> - hashcpy(ref->new_sha1, local);
> + hashcpy(ref->new_sha1, o->sha1);
> if (!args->verbose)
> continue;
> fprintf(stderr,
> "already have %s (%s)\n", sha1_to_hex(remote),
> ref->name);
> }
> return retval;
> ---
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Bug in fetch-pack.c, please confirm
2015-03-15 7:27 ` Junio C Hamano
@ 2015-03-15 7:30 ` Junio C Hamano
2015-03-15 22:21 ` Kyle J. McKay
0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2015-03-15 7:30 UTC (permalink / raw)
To: Kyle J. McKay; +Cc: Jeff King, Git mailing list
Junio C Hamano <gitster@pobox.com> writes:
> "Kyle J. McKay" <mackyle@gmail.com> writes:
>
>> Hi guys,
>>
>> So I was looking at fetch-pack.c (from master @ 52cae643, but I think
>> it's the same everywhere):
>>
> ...
>> - hashcpy(ref->new_sha1, local);
>> + hashcpy(ref->new_sha1, o->sha1);
>> if (!args->verbose)
>> continue;
>> fprintf(stderr,
>> "already have %s (%s)\n", sha1_to_hex(remote),
>> ref->name);
>> }
>> return retval;
>> ---
One thing I wonder is if this hashcpy() is doing anything useful,
though. Is ref->new_sha1 used after we are done in this codepath,
or is the reason nobody noticed it is because it does not matter
whatever garbage is in that field nobody looks at it?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Bug in fetch-pack.c, please confirm
2015-03-15 7:30 ` Junio C Hamano
@ 2015-03-15 22:21 ` Kyle J. McKay
0 siblings, 0 replies; 14+ messages in thread
From: Kyle J. McKay @ 2015-03-15 22:21 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jeff King, Git mailing list
On Mar 15, 2015, at 00:30, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> "Kyle J. McKay" <mackyle@gmail.com> writes:
>>
>>> Hi guys,
>>>
>>> So I was looking at fetch-pack.c (from master @ 52cae643, but I
>>> think
>>> it's the same everywhere):
>>>
>> ...
>>> - hashcpy(ref->new_sha1, local);
>>> + hashcpy(ref->new_sha1, o->sha1);
>>> if (!args->verbose)
>>> continue;
>>> fprintf(stderr,
>>> "already have %s (%s)\n", sha1_to_hex(remote),
>>> ref->name);
>>> }
>>> return retval;
>>> ---
>
> One thing I wonder is if this hashcpy() is doing anything useful,
> though. Is ref->new_sha1 used after we are done in this codepath,
> or is the reason nobody noticed it is because it does not matter
> whatever garbage is in that field nobody looks at it?
My thoughts exactly. hence the "please confirm" request. I'm not
familiar enough with the fetch pack code to know the answer off the
top of my head. I was hoping someone else who's been in the fetch
pack code recently (*Hi Peff*) might just already know. :)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Bug in fetch-pack.c, please confirm
2015-03-15 6:37 Bug in fetch-pack.c, please confirm Kyle J. McKay
2015-03-15 7:27 ` Junio C Hamano
@ 2015-03-16 1:13 ` Jeff King
2015-03-19 17:41 ` Junio C Hamano
1 sibling, 1 reply; 14+ messages in thread
From: Jeff King @ 2015-03-16 1:13 UTC (permalink / raw)
To: Kyle J. McKay; +Cc: Junio C Hamano, Git mailing list
On Sat, Mar 14, 2015 at 11:37:37PM -0700, Kyle J. McKay wrote:
> Peff, weren't you having some issue with want and have and hide refs?
Yes, but the problem was on the server side. I didn't look at this code
at all. :)
> Tell me please how the "local" variable above gets initialized?
So I think we all agree that this is bogus code, and the fix you
proposed is reasonable (but spoiler alert, if you read to the end, I
think we can do something even more extreme). Like Junio, I am puzzled
not by the bug itself, but by its lack of effect over the past 10 years.
I'm not all that familiar with this code, so I started with some rather
blunt-hammer tracing.
If you replace that hashcpy with "exit(1)", you get quite a few failures
in the test suite. So we really are running the code. Interestingly, if
you replace it with "hashclr(ref->old_sha1)" (note old, not new, so
impacting the sha1 that we know is actually valid), you still get some
failures, but many fewer.
It looks like one of the ways to hit this code path is by doing a clone
in which we don't actually need any objects (e.g., using "--reference").
Clone runs fetch-pack to acquire the objects, but then throws away the
returned refs and writes its own.
But there are still some failures, so clearly some code paths actually
do look at the resulting ref. But nobody seems to care about
ref->new_sha1. Let's take one of the cases that the hashclr() experiment
found and trace that. In t1507, we do a "git fetch" that hits this case.
I instrumented it like this:
diff --git a/t/t1507-rev-parse-upstream.sh b/t/t1507-rev-parse-upstream.sh
index 1978947..e2a192d 100755
--- a/t/t1507-rev-parse-upstream.sh
+++ b/t/t1507-rev-parse-upstream.sh
@@ -82,7 +82,7 @@ test_expect_success 'refs/heads/my-side@{upstream} does not resolve to my-side{u
test_expect_success 'my-side@{u} resolves to correct commit' '
git checkout side &&
test_commit 5 &&
- (cd clone && git fetch) &&
+ (cd clone && gdb --args ../../../git fetch </dev/tty >/dev/tty) &&
test 2 = "$(commit_subject my-side)" &&
test 5 = "$(commit_subject my-side@{u})"
'
If we break in everything_local at that hashcpy, we can see:
(gdb) print ref->name
$1 = 0x8422b0 "refs/heads/master"
(gdb) print sha1_to_hex(ref->old_sha1)
$2 = 0x815a40 <hexbuffer> "8f489d01d0cc65c3b0f09504ec50b5ed02a70bd5"
(gdb) print sha1_to_hex(ref->new_sha1)
$3 = 0x815a69 <hexbuffer+41> "f2795a000000000056134b", '0' <repeats 11 times>, "8000000"
So that's the bogus value copied in. Presumably it's random bytes and
not bleed over from some other sha1, as it's rather unlikely to have
that many zeroes in a real hash.
If we then leave everything_local, we see that the bogus value comes out
to do_fetch_pack via the "ref" parameter. We feed that into find_common,
but it only looks at ref->old_sha1, the other side. We keep going, and
do_fetch_pack returns the bogus ref up to fetch_pack(). That in turn
passes it up to fetch_refs_via_pack.
And there we stop. We don't pass the "refs" list out of that function
(which, as an aside, is probably a leak). Instead, we depend on the list
of heads we already knew in the "to_fetch" array. That comes from
processing the earlier list of refs returned from get_refs_via_connect.
I think it was the case once upon a time that fetch-pack did more of the
"which refs to fetch, and how to update them" logic. The pipeline for
fetch is now more like:
1. get_refs_via_connect returns a list of refs
2. caller munges the list of refs, based on refspecs
3. fetch_refs_via_pack takes that list of refs as input
Of course, this being git, there's always a way to try to access the
"old" code paths. :)
You can do something like:
git init
git commit --allow-empty -m foo
git fetch-pack --all .
which hits the same code path, but actually retains the return value
from fetch_pack(). But even there, it looks like we don't look at
ref->new_sha1 at all. fetch-pack doesn't update refs at all, but just
prints the list of new values to stdout.
So I think there is literally no code path that ever looks at the bogus
sha1. We could just drop that hashcpy() entirely.
That doesn't mean there isn't an additional bug lurking. That is,
_should_ somebody be caring about that value? I don't think so. The
old/new pairs in a "struct ref" are meaningful as "I proposed to update
to X, and we are at Y". But this list of refs does not have anything to
do with the update of local refs. It is only "what is the value of the
ref on the other side". The local refs are taken care of in a separate
list.
-Peff
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: Bug in fetch-pack.c, please confirm
2015-03-16 1:13 ` Jeff King
@ 2015-03-19 17:41 ` Junio C Hamano
2015-03-19 18:55 ` Jeff King
0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2015-03-19 17:41 UTC (permalink / raw)
To: Jeff King; +Cc: Kyle J. McKay, Git mailing list
Jeff King <peff@peff.net> writes:
> ...
> And there we stop. We don't pass the "refs" list out of that function
> (which, as an aside, is probably a leak). Instead, we depend on the list
> of heads we already knew in the "to_fetch" array. That comes from
> processing the earlier list of refs returned from get_refs_via_connect.
> ...
> That doesn't mean there isn't an additional bug lurking. That is,
> _should_ somebody be caring about that value? I don't think so. The
> old/new pairs in a "struct ref" are meaningful as "I proposed to update
> to X, and we are at Y". But this list of refs does not have anything to
> do with the update of local refs. It is only "what is the value of the
> ref on the other side". The local refs are taken care of in a separate
> list.
Correct. When we want to insert some function to allow users/hooks
to tweak the result of update, we might want to make sure that we
are passing the final list of refs to that function, but the purpose
of this function is not to make such a modification.
Just to make sure we do not keep this hanging forever and eventually
forget it, I'm planning to queue this.
Thanks.
-- >8 --
From: Jeff King <peff@peff.next>
Date: Sun, 15 Mar 2015 21:13:43 -0400
Subject: [PATCH] fetch-pack: remove dead assignment to ref->new_sha1
In everything_local(), we used to assign the current ref's value
found in ref->old_sha1 to ref->new_sha1 when we already have all the
necessary objects to complete the history leading to that commit.
This copying was broken at 49bb805e (Do not ask for objects known to
be complete., 2005-10-19) and ever since we instead stuffed a random
bytes in ref->new_sha1 here. No code complained or failed due to this
breakage.
It turns out that no codepath that comes after this assignment even
looks at ref->new_sha1 at all.
- The only caller of everything_local(), do_fetch_pack(), returns
this list of ref, whose element has bogus new_sha1 values, to its
caller. It does not look at the elements and acts on them.
- The only caller of do_fetch_pack(), fetch_pack(), returns this
list to its caller. It does not look at the elements and acts on
them.
- One of the two callers of fetch_pack() is cmd_fetch_pack(), the
top-level that implements "git fetch-pack". The only thing it
looks at in the elements of the returned ref list is the old_sha1
and name fields.
- The other caller of fetch_pack() is fetch_refs_via_pack() in the
transport layer, which is a helper that implements "git fetch".
It only cares about whether the returned list is empty (i.e.
failed to fetch anything).
Just drop the bogus assignment, that is not even necessary. The
remote-tracking refs are updated based on a different list and not
using the ref list being manipulated by this codepath; the caller
do_fetch_pack() created a copy of that real ref list and passed the
copy down to this function, and modifying the elements here does not
affect anything.
Noticed-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
fetch-pack.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index 655ee64..6f0c0e1 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -625,7 +625,6 @@ static int everything_local(struct fetch_pack_args *args,
for (retval = 1, ref = *refs; ref ; ref = ref->next) {
const unsigned char *remote = ref->old_sha1;
- unsigned char local[20];
struct object *o;
o = lookup_object(remote);
@@ -638,8 +637,6 @@ static int everything_local(struct fetch_pack_args *args,
ref->name);
continue;
}
-
- hashcpy(ref->new_sha1, local);
if (!args->verbose)
continue;
fprintf(stderr,
--
2.3.3-377-gdf43604
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: Bug in fetch-pack.c, please confirm
2015-03-19 17:41 ` Junio C Hamano
@ 2015-03-19 18:55 ` Jeff King
2015-03-19 19:01 ` Junio C Hamano
0 siblings, 1 reply; 14+ messages in thread
From: Jeff King @ 2015-03-19 18:55 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Kyle J. McKay, Git mailing list
On Thu, Mar 19, 2015 at 10:41:50AM -0700, Junio C Hamano wrote:
> Just to make sure we do not keep this hanging forever and eventually
> forget it, I'm planning to queue this.
Thanks for following up. A few minor nits (and maybe a more serious
problem) on the explanation in the commit message:
> be complete., 2005-10-19) and ever since we instead stuffed a random
> bytes in ref->new_sha1 here.
s/a random/random/
> It turns out that no codepath that comes after this assignment even
> looks at ref->new_sha1 at all.
>
> - The only caller of everything_local(), do_fetch_pack(), returns
> this list of ref, whose element has bogus new_sha1 values, to its
> caller. It does not look at the elements and acts on them.
s/list of ref/list of refs/ ? Or did you mean "the list we store in the
'ref' variable"?
I'm not sure what the final sentence means. Should it be "It does not
look at the elements nor act on them"?
do_fetch_pack actually does pass them down to find_common. But the
latter does not look at ref->new_sha1, so we're OK.
> - The only caller of do_fetch_pack(), fetch_pack(), returns this
> list to its caller. It does not look at the elements and acts on
> them.
Ditto on the wording in the final sentence here (but it is correct in
this case that we do not touch the list at all).
> - The other caller of fetch_pack() is fetch_refs_via_pack() in the
> transport layer, which is a helper that implements "git fetch".
> It only cares about whether the returned list is empty (i.e.
> failed to fetch anything).
So I thought I would follow this up by adding a free_refs() call in
fetch_refs_via_pack. After all, we just leak that list, right?
If only it were so simple. It turns out that the list returned from
fetch_pack() is _mostly_ a copy of the incoming refs list we give it.
But in filter_refs(), if allow_tip_sha1_in_want is set, we actually add
the unmatched "sought" refs here, too.
Which means we may be setting new_sha1 in one of these "sought" refs,
and that broadens the scope of code that might be affected by the patch
we have been discussing. However, I think the filter_refs code is wrong,
and should be making a copy of the sought refs to put in the new list.
I'm working up a few patches in that area, which I'll send out in a few
minutes. Once that is done, then I think the explanation you give above
would be correct.
-Peff
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Bug in fetch-pack.c, please confirm
2015-03-19 18:55 ` Jeff King
@ 2015-03-19 19:01 ` Junio C Hamano
2015-03-19 20:31 ` Jeff King
0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2015-03-19 19:01 UTC (permalink / raw)
To: Jeff King; +Cc: Kyle J. McKay, Git mailing list
Jeff King <peff@peff.net> writes:
>> - The only caller of everything_local(), do_fetch_pack(), returns
>> this list of ref, whose element has bogus new_sha1 values, to its
>> caller. It does not look at the elements and acts on them.
>
> I'm not sure what the final sentence means. Should it be "It does not
> look at the elements nor act on them"?
Yes. "It does not look at the new_sha1. It does not use the
information to change its behaviour. Both of the previous two
statements are true." is what I meant.
> do_fetch_pack actually does pass them down to find_common. But the
> latter does not look at ref->new_sha1, so we're OK.
>> - The other caller of fetch_pack() is fetch_refs_via_pack() in the
>> transport layer, which is a helper that implements "git fetch".
>> It only cares about whether the returned list is empty (i.e.
>> failed to fetch anything).
>
> So I thought I would follow this up by adding a free_refs() call in
> fetch_refs_via_pack. After all, we just leak that list, right?
Yeah, I agree.
> I'm working up a few patches in that area, which I'll send out in a few
> minutes. Once that is done, then I think the explanation you give above
> would be correct.
If a follow-up is coming then I'd just drop this one. Thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Bug in fetch-pack.c, please confirm
2015-03-19 19:01 ` Junio C Hamano
@ 2015-03-19 20:31 ` Jeff King
2015-03-19 20:34 ` [PATCH 1/4] filter_ref: avoid overwriting ref->old_sha1 with garbage Jeff King
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Jeff King @ 2015-03-19 20:31 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Kyle J. McKay, Git mailing list
On Thu, Mar 19, 2015 at 12:01:26PM -0700, Junio C Hamano wrote:
> > I'm working up a few patches in that area, which I'll send out in a few
> > minutes. Once that is done, then I think the explanation you give above
> > would be correct.
>
> If a follow-up is coming then I'd just drop this one. Thanks.
OK, here it is. Took me a bit longer than I expected, as I wanted to
figure out whether the second patch was actually fixing a bug (and if
so, to add test coverage). Turns out that it is a real bug.
The final patch is what you sent, rebased on top (though there are not
any code changes; the underlying commits make the _explanation_ true,
but no code change was required). I fixed up the nits I mentioned in my
earlier email.
[1/4]: filter_ref: avoid overwriting ref->old_sha1 with garbage
[2/4]: filter_ref: make a copy of extra "sought" entries
[3/4]: fetch_refs_via_pack: free extra copy of refs
[4/4]: fetch-pack: remove dead assignment to ref->new_sha1
-Peff
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/4] filter_ref: avoid overwriting ref->old_sha1 with garbage
2015-03-19 20:31 ` Jeff King
@ 2015-03-19 20:34 ` Jeff King
2015-03-19 21:09 ` Junio C Hamano
2015-03-19 20:37 ` [PATCH 2/4] filter_ref: make a copy of extra "sought" entries Jeff King
` (2 subsequent siblings)
3 siblings, 1 reply; 14+ messages in thread
From: Jeff King @ 2015-03-19 20:34 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Kyle J. McKay, Git mailing list
If the server supports allow_tip_sha1_in_want, then
fetch-pack's filter_refs function tries to check whether a
ref is a request for a straight sha1 by running:
if (get_sha1_hex(ref->name, ref->old_sha1))
...
I.e., we are using get_sha1_hex to ask "is this ref name a
sha1?". If it is true, then the contents of ref->old_sha1
will end up unchanged. But if it is false, then get_sha1_hex
makes no guarantees about what it has written. With a ref
name like "abcdefoo", we would overwrite 3 bytes of
ref->old_sha1 before realizing that it was not a sha1.
This is likely not a problem in practice, as anything in
refs->name (besides a sha1) will start with "refs/", meaning
that we would notice on the first character that there is a
problem. Still, we are making assumptions about the state
left in the output when get_sha1_hex returns an error (e.g.,
it could start from the end of the string, or error check
the values only once they were placed in the output). It's
better to be defensive.
We could just check that we have exactly 40 characters of
sha1. But let's be even more careful and make sure that we
have a 40-char hex refname that matches what is in old_sha1.
This is perhaps overly defensive, but spells out our
assumptions clearly.
Signed-off-by: Jeff King <peff@peff.net>
---
This one is not necessary for the others, of course, and we can drop it
if you disagree with the reasoning.
I wonder if the thinking in the original was that it was our
responsibility here to make sure that ref->old_sha1 was filled in. It is
always filled in by the caller who gives us "sought", which makes sense
to me (this matches the rest of the "sought" entries, which come from
reading the remote's ref list, and of course must fill in old_sha1 from
that list).
fetch-pack.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index 655ee64..058c258 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -544,10 +544,14 @@ static void filter_refs(struct fetch_pack_args *args,
/* Append unmatched requests to the list */
if (allow_tip_sha1_in_want) {
for (i = 0; i < nr_sought; i++) {
+ unsigned char sha1[20];
+
ref = sought[i];
if (ref->matched)
continue;
- if (get_sha1_hex(ref->name, ref->old_sha1))
+ if (get_sha1_hex(ref->name, sha1) ||
+ ref->name[40] != '\0' ||
+ hashcmp(sha1, ref->old_sha1))
continue;
ref->matched = 1;
--
2.3.3.520.g3cfbb5d
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/4] filter_ref: make a copy of extra "sought" entries
2015-03-19 20:31 ` Jeff King
2015-03-19 20:34 ` [PATCH 1/4] filter_ref: avoid overwriting ref->old_sha1 with garbage Jeff King
@ 2015-03-19 20:37 ` Jeff King
2015-03-19 20:38 ` [PATCH 3/4] fetch_refs_via_pack: free extra copy of refs Jeff King
2015-03-19 20:39 ` [PATCH 4/4] fetch-pack: remove dead assignment to ref->new_sha1 Jeff King
3 siblings, 0 replies; 14+ messages in thread
From: Jeff King @ 2015-03-19 20:37 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Kyle J. McKay, Git mailing list
If the server supports allow_tip_sha1_in_want, we add any
unmatched raw-sha1 entries in our "sought" list of refs to
the list of refs we will ask the other side for. We do so by
inserting the original "struct ref" directly into our list,
rather than making a copy. This has several problems.
The most minor problem is that one cannot ever free the
resulting list; it contains structs that are copies of the
remote refs (made earlier by fetch_pack) along with sought
refs that are referenced elsewhere.
But more importantly that we set the ref->next pointer to
NULL, chopping off the remainder of any existing list that
the ref was a part of. We get the set of "sought" refs in
an array rather than a linked list, but that array is often
in turn generated from a list. The test modification in
t5516 demonstrates this. Rather than fetching just an exact
sha1, we fetch that sha1 plus another ref:
- we build a linked list of refs to fetch when do_fetch
calls get_ref_map; the exact sha1 is first, followed by
the named ref ("refs/heads/extra" in this case).
- we pass that linked list to transport_fetch_ref, which
squashes it into an array of pointers
- that array goes to fetch_pack, which calls filter_ref.
There we generate the want list from a mix of what the
remote side has advertised, and the "sought" entry for
the exact sha1. We set the sought entry's "next" pointer
to NULL.
- after we return from transport_fetch_refs, we then try
to update the refs by following the linked list. But our
list is now truncated, and we do not update
refs/heads/extra at all.
We can fix this by making a copy of the ref. There's nothing
that fetch_pack does to it that must be reflected in the
original "sought" list (and indeed, if that were the case we
would have a serious bug, because it is only exact-sha1
entries which are treated this way).
Signed-off-by: Jeff King <peff@peff.net>
---
I always feel a little dirty modifying an existing test rather than
writing a new one that more clearly demonstrates what we are testing.
But the setup involved in the exact-sha1 fetch is a little complicated,
so this seemed easiest (and anybody blaming the change can hopefully
rely on the explanation above to understand what is going on).
fetch-pack.c | 5 ++---
t/t5516-fetch-push.sh | 13 ++++++++++---
2 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index 058c258..84d5a66 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -555,9 +555,8 @@ static void filter_refs(struct fetch_pack_args *args,
continue;
ref->matched = 1;
- *newtail = ref;
- ref->next = NULL;
- newtail = &ref->next;
+ *newtail = copy_ref(ref);
+ newtail = &(*newtail)->next;
}
}
*refs = newlist;
diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh
index 630885d..5e04d64 100755
--- a/t/t5516-fetch-push.sh
+++ b/t/t5516-fetch-push.sh
@@ -1107,9 +1107,16 @@ test_expect_success 'fetch exact SHA1' '
git config uploadpack.allowtipsha1inwant true
) &&
- git fetch -v ../testrepo $the_commit:refs/heads/copy &&
- result=$(git rev-parse --verify refs/heads/copy) &&
- test "$the_commit" = "$result"
+ git fetch -v ../testrepo $the_commit:refs/heads/copy master:refs/heads/extra &&
+ cat >expect <<-EOF &&
+ $the_commit
+ $the_first_commit
+ EOF
+ {
+ git rev-parse --verify refs/heads/copy &&
+ git rev-parse --verify refs/heads/extra
+ } >actual &&
+ test_cmp expect actual
)
'
--
2.3.3.520.g3cfbb5d
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/4] fetch_refs_via_pack: free extra copy of refs
2015-03-19 20:31 ` Jeff King
2015-03-19 20:34 ` [PATCH 1/4] filter_ref: avoid overwriting ref->old_sha1 with garbage Jeff King
2015-03-19 20:37 ` [PATCH 2/4] filter_ref: make a copy of extra "sought" entries Jeff King
@ 2015-03-19 20:38 ` Jeff King
2015-03-19 20:39 ` [PATCH 4/4] fetch-pack: remove dead assignment to ref->new_sha1 Jeff King
3 siblings, 0 replies; 14+ messages in thread
From: Jeff King @ 2015-03-19 20:38 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Kyle J. McKay, Git mailing list
When fetch_refs_via_pack calls fetch_pack(), we pass a
list of refs to fetch, and the function returns either a
copy of that list, with the fetched items filled in, or
NULL. We check the return value to see whether the fetch was
successful, but do not otherwise look at the copy, and
simply leak it at the end of the function.
Signed-off-by: Jeff King <peff@peff.net>
---
Without patch 2, this segfaults in t5516. :)
transport.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/transport.c b/transport.c
index 0694a7c..b1953b2 100644
--- a/transport.c
+++ b/transport.c
@@ -519,7 +519,7 @@ static int fetch_refs_via_pack(struct transport *transport,
int nr_heads, struct ref **to_fetch)
{
struct git_transport_data *data = transport->data;
- const struct ref *refs;
+ struct ref *refs;
char *dest = xstrdup(transport->url);
struct fetch_pack_args args;
struct ref *refs_tmp = NULL;
@@ -552,15 +552,17 @@ static int fetch_refs_via_pack(struct transport *transport,
&transport->pack_lockfile);
close(data->fd[0]);
close(data->fd[1]);
- if (finish_connect(data->conn))
+ if (finish_connect(data->conn)) {
+ free_refs(refs);
refs = NULL;
+ }
data->conn = NULL;
data->got_remote_heads = 0;
data->options.self_contained_and_connected =
args.self_contained_and_connected;
free_refs(refs_tmp);
-
+ free_refs(refs);
free(dest);
return (refs ? 0 : -1);
}
--
2.3.3.520.g3cfbb5d
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 4/4] fetch-pack: remove dead assignment to ref->new_sha1
2015-03-19 20:31 ` Jeff King
` (2 preceding siblings ...)
2015-03-19 20:38 ` [PATCH 3/4] fetch_refs_via_pack: free extra copy of refs Jeff King
@ 2015-03-19 20:39 ` Jeff King
3 siblings, 0 replies; 14+ messages in thread
From: Jeff King @ 2015-03-19 20:39 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Kyle J. McKay, Git mailing list
In everything_local(), we used to assign the current ref's value
found in ref->old_sha1 to ref->new_sha1 when we already have all the
necessary objects to complete the history leading to that
commit. This copying was broken at 49bb805e (Do not ask for
objects known to be complete., 2005-10-19) and ever since we
instead stuffed a random bytes in ref->new_sha1 here. No
code complained or failed due to this breakage.
It turns out that no code path that comes after this
assignment even looks at ref->new_sha1 at all.
- The only caller of everything_local(), do_fetch_pack(),
returns this list of refs, whose element has bogus
new_sha1 values, to its caller. It does not look at the
elements itself, but does pass them to find_common, which
looks only at the name and old_sha1 fields.
- The only caller of do_fetch_pack(), fetch_pack(), returns this
list to its caller. It does not look at the elements nor act on
them.
- One of the two callers of fetch_pack() is cmd_fetch_pack(), the
top-level that implements "git fetch-pack". The only thing it
looks at in the elements of the returned ref list is the old_sha1
and name fields.
- The other caller of fetch_pack() is fetch_refs_via_pack() in the
transport layer, which is a helper that implements "git fetch".
It only cares about whether the returned list is empty (i.e.
failed to fetch anything).
Just drop the bogus assignment, that is not even necessary. The
remote-tracking refs are updated based on a different list and not
using the ref list being manipulated by this code path; the caller
do_fetch_pack() created a copy of that real ref list and passed the
copy down to this function, and modifying the elements here does not
affect anything.
Noticed-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
fetch-pack.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index 84d5a66..48526aa 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -628,7 +628,6 @@ static int everything_local(struct fetch_pack_args *args,
for (retval = 1, ref = *refs; ref ; ref = ref->next) {
const unsigned char *remote = ref->old_sha1;
- unsigned char local[20];
struct object *o;
o = lookup_object(remote);
@@ -641,8 +640,6 @@ static int everything_local(struct fetch_pack_args *args,
ref->name);
continue;
}
-
- hashcpy(ref->new_sha1, local);
if (!args->verbose)
continue;
fprintf(stderr,
--
2.3.3.520.g3cfbb5d
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/4] filter_ref: avoid overwriting ref->old_sha1 with garbage
2015-03-19 20:34 ` [PATCH 1/4] filter_ref: avoid overwriting ref->old_sha1 with garbage Jeff King
@ 2015-03-19 21:09 ` Junio C Hamano
0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2015-03-19 21:09 UTC (permalink / raw)
To: Jeff King; +Cc: Kyle J. McKay, Git mailing list
Jeff King <peff@peff.net> writes:
> I wonder if the thinking in the original was that it was our
> responsibility here to make sure that ref->old_sha1 was filled in.
I am reasonably sure that is the (perhaps mistaken) reasoning behind
the use of old_sha1 as the second parameter to get_sha1_hex().
> It is
> always filled in by the caller who gives us "sought", which makes sense
> to me (this matches the rest of the "sought" entries, which come from
> reading the remote's ref list, and of course must fill in old_sha1 from
> that list).
I see that sought is populated by reading the command line of "git
fetch-pack", and for a 40-hex request ref->old_sha1 is already
filled there, so I agree that it is redundant to try filling it in
filter_refs().
Thanks.
> fetch-pack.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 655ee64..058c258 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -544,10 +544,14 @@ static void filter_refs(struct fetch_pack_args *args,
> /* Append unmatched requests to the list */
> if (allow_tip_sha1_in_want) {
> for (i = 0; i < nr_sought; i++) {
> + unsigned char sha1[20];
> +
> ref = sought[i];
> if (ref->matched)
> continue;
> - if (get_sha1_hex(ref->name, ref->old_sha1))
> + if (get_sha1_hex(ref->name, sha1) ||
> + ref->name[40] != '\0' ||
> + hashcmp(sha1, ref->old_sha1))
> continue;
>
> ref->matched = 1;
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2015-03-19 21:09 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-15 6:37 Bug in fetch-pack.c, please confirm Kyle J. McKay
2015-03-15 7:27 ` Junio C Hamano
2015-03-15 7:30 ` Junio C Hamano
2015-03-15 22:21 ` Kyle J. McKay
2015-03-16 1:13 ` Jeff King
2015-03-19 17:41 ` Junio C Hamano
2015-03-19 18:55 ` Jeff King
2015-03-19 19:01 ` Junio C Hamano
2015-03-19 20:31 ` Jeff King
2015-03-19 20:34 ` [PATCH 1/4] filter_ref: avoid overwriting ref->old_sha1 with garbage Jeff King
2015-03-19 21:09 ` Junio C Hamano
2015-03-19 20:37 ` [PATCH 2/4] filter_ref: make a copy of extra "sought" entries Jeff King
2015-03-19 20:38 ` [PATCH 3/4] fetch_refs_via_pack: free extra copy of refs Jeff King
2015-03-19 20:39 ` [PATCH 4/4] fetch-pack: remove dead assignment to ref->new_sha1 Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).