http-protocol question

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* http-protocol question
@ 2014-12-02  2:17 Bryan Turner
  2014-12-02  3:34 ` Jonathan Nieder
  0 siblings, 1 reply; 14+ messages in thread
From: Bryan Turner @ 2014-12-02  2:17 UTC (permalink / raw)
  To: Git Users

In Documentation/technical/http-protocol.txt, there is the following statement:

"S: Parse the git-upload-pack request:

Verify all objects in `want` are directly reachable from refs.

The server MAY walk backwards through history or through
the reflog to permit slightly stale requests."

Is there actually logic somewhere in Git that does that "MAY walk
backwards" step? Or is that something another implementation of Git
may do but the standard Git does not?

Thanks,
Bryan Turner

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: http-protocol question
  2014-12-02  2:17 http-protocol question Bryan Turner
@ 2014-12-02  3:34 ` Jonathan Nieder
  2014-12-02  4:28   ` Bryan Turner
  0 siblings, 1 reply; 14+ messages in thread
From: Jonathan Nieder @ 2014-12-02  3:34 UTC (permalink / raw)
  To: Bryan Turner; +Cc: Git Users

Hi Bryan,

Bryan Turner wrote:

> Is there actually logic somewhere in Git that does that "MAY walk
> backwards" step?

Yes.  See upload-pack.c::check_non_tip and
http://thread.gmane.org/gmane.comp.version-control.git/178814.

Hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: http-protocol question
  2014-12-02  3:34 ` Jonathan Nieder
@ 2014-12-02  4:28   ` Bryan Turner
  2014-12-02  4:29     ` Bryan Turner
  2014-12-02  4:45     ` Jonathan Nieder
  0 siblings, 2 replies; 14+ messages in thread
From: Bryan Turner @ 2014-12-02  4:28 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Git Users

On Tue, Dec 2, 2014 at 2:34 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Hi Bryan,
>
> Bryan Turner wrote:
>
>> Is there actually logic somewhere in Git that does that "MAY walk
>> backwards" step?
>
> Yes.  See upload-pack.c::check_non_tip and
> http://thread.gmane.org/gmane.comp.version-control.git/178814.

Jonathan,

Thanks for the reply! I realize now I didn't really cite the part of
that documentation that I'm most interested in, which is: "through
history _or through the reflog_". It's the reflog bit I'm looking for.
Sorry for not being more clear. check_non_tip appears to look at
ancestry, walking back up (or down, depending on your view) the DAG to
see if the requested SHA-1 is reachable from a current ref, so it
looks like that's covering the "through history" part of that blurb.

The reason I ask is that there is a race condition that exists where
the ref advertisement lists refs/heads/foo at abc1234, and then foo is
deleted before the actual upload-pack request comes in. In that
situation, walking backwards through _history_ will only allow the
upload-pack to complete unless the deleted ref was merged into another
ref prior to being deleted (if I understand the code correctly). It
seems like looking at the reflogs, and seeing that refs/heads/foo did
in fact exist and did in fact refer to abc1234, is the only approach
that could allow the upload-pack to complete. The documentation hints
that such a thing is possible, but I don't see any code in Git that
seems to do that.

Does that make more sense? Does that functionality exist and I've just
overlooked it?

Thanks again,
Bryan Turner

>
> Hope that helps,
> Jonathan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: http-protocol question
  2014-12-02  4:28   ` Bryan Turner
@ 2014-12-02  4:29     ` Bryan Turner
  2014-12-02  4:45     ` Jonathan Nieder
  1 sibling, 0 replies; 14+ messages in thread
From: Bryan Turner @ 2014-12-02  4:29 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Git Users

On Tue, Dec 2, 2014 at 3:28 PM, Bryan Turner <bturner@atlassian.com> wrote:
> On Tue, Dec 2, 2014 at 2:34 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
>> Hi Bryan,
>>
>> Bryan Turner wrote:
>>
>>> Is there actually logic somewhere in Git that does that "MAY walk
>>> backwards" step?
>>
>> Yes.  See upload-pack.c::check_non_tip and
>> http://thread.gmane.org/gmane.comp.version-control.git/178814.
>
> Jonathan,
>
> Thanks for the reply! I realize now I didn't really cite the part of
> that documentation that I'm most interested in, which is: "through
> history _or through the reflog_". It's the reflog bit I'm looking for.
> Sorry for not being more clear. check_non_tip appears to look at
> ancestry, walking back up (or down, depending on your view) the DAG to
> see if the requested SHA-1 is reachable from a current ref, so it
> looks like that's covering the "through history" part of that blurb.
>
> The reason I ask is that there is a race condition that exists where
> the ref advertisement lists refs/heads/foo at abc1234, and then foo is
> deleted before the actual upload-pack request comes in. In that
> situation, walking backwards through _history_ will only allow the
> upload-pack to complete unless the deleted ref was merged into another

s/unless/if, sorry. "In that situation, walking backwards through
history will only allow the upload-pack to complete if the deleted ref
was merged into another ref."

> ref prior to being deleted (if I understand the code correctly). It
> seems like looking at the reflogs, and seeing that refs/heads/foo did
> in fact exist and did in fact refer to abc1234, is the only approach
> that could allow the upload-pack to complete. The documentation hints
> that such a thing is possible, but I don't see any code in Git that
> seems to do that.
>
> Does that make more sense? Does that functionality exist and I've just
> overlooked it?
>
> Thanks again,
> Bryan Turner
>
>>
>> Hope that helps,
>> Jonathan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: http-protocol question
  2014-12-02  4:28   ` Bryan Turner
  2014-12-02  4:29     ` Bryan Turner
@ 2014-12-02  4:45     ` Jonathan Nieder
  2014-12-02  5:04       ` Bryan Turner
  1 sibling, 1 reply; 14+ messages in thread
From: Jonathan Nieder @ 2014-12-02  4:45 UTC (permalink / raw)
  To: Bryan Turner; +Cc: Git Users

Hi,

Bryan Turner wrote:

> The reason I ask is that there is a race condition that exists where
> the ref advertisement lists refs/heads/foo at abc1234, and then foo is
> deleted before the actual upload-pack request comes in.

Can you say more about the context?  For example, was this actually
happening, or is this for the sake of understanding the protocol
better?

I ask because knowing the context might help us give more specific
advice.

Sometimes when people delete a branch they really want those objects
to be inaccessible *right away*.  So for such people, git's behavior
of failing the request unless the objects are still accessible by
some other path is helpful.

A stateful server could theoretically cache the list of objects they
have advertised for a short time, to avoid clients having to suffer
when something becomes inaccessible during the window between the
upload-pack advertisement and upload-pack request.  Or a permissive
server can allow all wants except a specific blacklist (and some
people do that).

Jonathan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: http-protocol question
  2014-12-02  4:45     ` Jonathan Nieder
@ 2014-12-02  5:04       ` Bryan Turner
  2014-12-02  5:17         ` Jonathan Nieder
  2014-12-02  5:33         ` Jeff King
  0 siblings, 2 replies; 14+ messages in thread
From: Bryan Turner @ 2014-12-02  5:04 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Git Users

On Tue, Dec 2, 2014 at 3:45 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Hi,
>
> Bryan Turner wrote:
>
>> The reason I ask is that there is a race condition that exists where
>> the ref advertisement lists refs/heads/foo at abc1234, and then foo is
>> deleted before the actual upload-pack request comes in.
>
> Can you say more about the context?  For example, was this actually
> happening, or is this for the sake of understanding the protocol
> better?

I ask because it's actually happening. Heavy CI load sometimes has
builds fail because git clone dies with "not our ref". That's the
specific context I'm working to try and address right now. Some teams
use rebase-heavy workflows, which also evades the check_non_tip easing
and fails with "not our ref", so I can't be 100% certain it's ref
deletion in specific causing it (but I guess which of those it is is
probably largely academic; as long as hosting spans multiple requests
it seems like this type of race condition is unavoidable).

I'm trying to decide if there is something I can enable or tune to
prevent it, and the type of twilighting hinted at by the http-protocol
documentation seemed like it could be just the thing.

>
> I ask because knowing the context might help us give more specific
> advice.
>
> Sometimes when people delete a branch they really want those objects
> to be inaccessible *right away*.  So for such people, git's behavior
> of failing the request unless the objects are still accessible by
> some other path is helpful.
>
> A stateful server could theoretically cache the list of objects they
> have advertised for a short time, to avoid clients having to suffer
> when something becomes inaccessible during the window between the
> upload-pack advertisement and upload-pack request.  Or a permissive
> server can allow all wants except a specific blacklist (and some
> people do that).
>
> Jonathan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: http-protocol question
  2014-12-02  5:04       ` Bryan Turner
@ 2014-12-02  5:17         ` Jonathan Nieder
  2014-12-02  5:30           ` Duy Nguyen
  2014-12-02  5:37           ` Duy Nguyen
  2014-12-02  5:33         ` Jeff King
  1 sibling, 2 replies; 14+ messages in thread
From: Jonathan Nieder @ 2014-12-02  5:17 UTC (permalink / raw)
  To: Bryan Turner; +Cc: Git Users

Bryan Turner wrote:

> I ask because it's actually happening. Heavy CI load sometimes has
> builds fail because git clone dies with "not our ref". That's the
> specific context I'm working to try and address right now.

Thanks --- that makes sense.

>                                                            Some teams
> use rebase-heavy workflows, which also evades the check_non_tip easing
> and fails with "not our ref", so I can't be 100% certain it's ref
> deletion in specific causing it (but I guess which of those it is is
> probably largely academic; as long as hosting spans multiple requests
> it seems like this type of race condition is unavoidable).

Yeah, everything mentioned before about ref deletion also applies to
non-fast-forward updates.

> I'm trying to decide if there is something I can enable or tune to
> prevent it, and the type of twilighting hinted at by the http-protocol
> documentation seemed like it could be just the thing.

If you control the side that clones, then just cloning the single branch
you are building (with --single-branch and -b) can help.

Using a bidirectional protocol like git:// or ssh:// (where the ref
advertisement and check of wants happen in the same connection) would
avoid the problem we're talking about.

On the server side, I agree that either mining reflogs or storing
advertisements to disk would be a nice way to take care of this.
No one has implemented either of those, but it would be a nice setting
to have. :)

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: http-protocol question
  2014-12-02  5:17         ` Jonathan Nieder
@ 2014-12-02  5:30           ` Duy Nguyen
  2014-12-02  5:37           ` Duy Nguyen
  1 sibling, 0 replies; 14+ messages in thread
From: Duy Nguyen @ 2014-12-02  5:30 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Bryan Turner, Git Users

On Tue, Dec 2, 2014 at 12:17 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> On the server side, I agree that either mining reflogs or storing
> advertisements to disk would be a nice way to take care of this.
> No one has implemented either of those, but it would be a nice setting
> to have. :)

and could be dangerous too. If I accidentally add a password file,
then delete it (way before a clone is performed), I don't want that
part of reflog visible to the cloner. Problem is we don't know how far
we should look back in reflog unless we keep some state.
-- 
Duy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: http-protocol question
  2014-12-02  5:04       ` Bryan Turner
  2014-12-02  5:17         ` Jonathan Nieder
@ 2014-12-02  5:33         ` Jeff King
  2014-12-02  5:47           ` Bryan Turner
  2014-12-02 17:45           ` Junio C Hamano
  1 sibling, 2 replies; 14+ messages in thread
From: Jeff King @ 2014-12-02  5:33 UTC (permalink / raw)
  To: Bryan Turner; +Cc: Jonathan Nieder, Git Users

On Tue, Dec 02, 2014 at 04:04:11PM +1100, Bryan Turner wrote:

> > Can you say more about the context?  For example, was this actually
> > happening, or is this for the sake of understanding the protocol
> > better?
> 
> I ask because it's actually happening. Heavy CI load sometimes has
> builds fail because git clone dies with "not our ref". That's the
> specific context I'm working to try and address right now. Some teams
> use rebase-heavy workflows, which also evades the check_non_tip easing
> and fails with "not our ref", so I can't be 100% certain it's ref
> deletion in specific causing it (but I guess which of those it is is
> probably largely academic; as long as hosting spans multiple requests
> it seems like this type of race condition is unavoidable).

There is a practical reason to care. Ref deletion will also delete the
reflog, leaving no trace of the reachability. Whereas a non-fast-forward
push could be resolved by looking in the reflog.

One problem with hunting for sha1s in the reflog is that upload-pack
does not know which ref the client thinks they are requesting. So a
search would involve looking in _every_ reflog, which could be
expensive. It might not be too painful if you do the search only after
hitting a "not our ref" condition, which should in theory be rare. A
malicious client could convince you to grep your reflogs repeatedly, but
that is hardly the only way to convince upload-pack to chew CPU. Asking
it to make a pack comes to mind. :)

> I'm trying to decide if there is something I can enable or tune to
> prevent it, and the type of twilighting hinted at by the http-protocol
> documentation seemed like it could be just the thing.

For a public repository, it might make sense to provide a config option
to loosen the is_our_ref check completely (i.e., to just has_sha1_file).
But such an option does not yet exist.

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: http-protocol question
  2014-12-02  5:17         ` Jonathan Nieder
  2014-12-02  5:30           ` Duy Nguyen
@ 2014-12-02  5:37           ` Duy Nguyen
  1 sibling, 0 replies; 14+ messages in thread
From: Duy Nguyen @ 2014-12-02  5:37 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Bryan Turner, Git Users

On Tue, Dec 2, 2014 at 12:17 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
>> I'm trying to decide if there is something I can enable or tune to
>> prevent it, and the type of twilighting hinted at by the http-protocol
>> documentation seemed like it could be just the thing.
>
> If you control the side that clones, then just cloning the single branch
> you are building (with --single-branch and -b) can help.

Or we could extend this a bit on the server side, if one ref fail, let
the clone continue with the remaining refs (and warn loudly to the
user).
-- 
Duy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: http-protocol question
  2014-12-02  5:33         ` Jeff King
@ 2014-12-02  5:47           ` Bryan Turner
  2014-12-02  5:52             ` Jeff King
  2014-12-02 17:45           ` Junio C Hamano
  1 sibling, 1 reply; 14+ messages in thread
From: Bryan Turner @ 2014-12-02  5:47 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, Git Users

On Tue, Dec 2, 2014 at 4:33 PM, Jeff King <peff@peff.net> wrote:
> On Tue, Dec 02, 2014 at 04:04:11PM +1100, Bryan Turner wrote:
>
>> > Can you say more about the context?  For example, was this actually
>> > happening, or is this for the sake of understanding the protocol
>> > better?
>>
>> I ask because it's actually happening. Heavy CI load sometimes has
>> builds fail because git clone dies with "not our ref". That's the
>> specific context I'm working to try and address right now. Some teams
>> use rebase-heavy workflows, which also evades the check_non_tip easing
>> and fails with "not our ref", so I can't be 100% certain it's ref
>> deletion in specific causing it (but I guess which of those it is is
>> probably largely academic; as long as hosting spans multiple requests
>> it seems like this type of race condition is unavoidable).
>
> There is a practical reason to care. Ref deletion will also delete the
> reflog, leaving no trace of the reachability. Whereas a non-fast-forward
> push could be resolved by looking in the reflog.

A fair point. I had mistakenly thought that reflogs would survive the
ref's deletion and be "pruned" as part of garbage collection, but a
quick test shows that, as I'm sure you already know, that's not true.

Thanks for correcting my mistake!
Bryan Turner

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: http-protocol question
  2014-12-02  5:47           ` Bryan Turner
@ 2014-12-02  5:52             ` Jeff King
  0 siblings, 0 replies; 14+ messages in thread
From: Jeff King @ 2014-12-02  5:52 UTC (permalink / raw)
  To: Bryan Turner; +Cc: Jonathan Nieder, Git Users

On Tue, Dec 02, 2014 at 04:47:50PM +1100, Bryan Turner wrote:

> > There is a practical reason to care. Ref deletion will also delete the
> > reflog, leaving no trace of the reachability. Whereas a non-fast-forward
> > push could be resolved by looking in the reflog.
> 
> A fair point. I had mistakenly thought that reflogs would survive the
> ref's deletion and be "pruned" as part of garbage collection, but a
> quick test shows that, as I'm sure you already know, that's not true.

I wish it worked that way. Unfortunately there are complications with
keeping the old reflogs in place, because they sometimes cause conflicts
with new refs being created (e.g., a reflog in ".git/logs/refs/heads/foo"
would prevent ".git/logs/refs/heads/foo/bar" from being created). I had
some patches long ago to try to keep a "reflog graveyard" around, but
they were quite invasive, and there were some corner cases that caused
weird errors.

Handling this sort of D/F conflict more gracefully is one of the things
I'd like to experiment with once we have pluggable ref backends (I think
we'll also disallow "foo/bar" if "foo" exists, but the storage could at
least keep the reflogs around).

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: http-protocol question
  2014-12-02  5:33         ` Jeff King
  2014-12-02  5:47           ` Bryan Turner
@ 2014-12-02 17:45           ` Junio C Hamano
  2014-12-02 19:50             ` Jeff King
  1 sibling, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2014-12-02 17:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Bryan Turner, Jonathan Nieder, Git Users

Jeff King <peff@peff.net> writes:

> For a public repository, it might make sense to provide a config option
> to loosen the is_our_ref check completely (i.e., to just has_sha1_file).
> But such an option does not yet exist.

In principle, yes, but that cannot be has_sha1_file(); it has to
have a fully connected healthy history behind it.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: http-protocol question
  2014-12-02 17:45           ` Junio C Hamano
@ 2014-12-02 19:50             ` Jeff King
  0 siblings, 0 replies; 14+ messages in thread
From: Jeff King @ 2014-12-02 19:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Bryan Turner, Jonathan Nieder, Git Users

On Tue, Dec 02, 2014 at 09:45:06AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > For a public repository, it might make sense to provide a config option
> > to loosen the is_our_ref check completely (i.e., to just has_sha1_file).
> > But such an option does not yet exist.
> 
> In principle, yes, but that cannot be has_sha1_file(); it has to
> have a fully connected healthy history behind it.

I thought about that, but I wonder if it matters whether we check up
front. We are not advertising it, but only trying our best to fulfill
the "want" from the other side.  So either:

  1. We check immediately whether we have the full history behind it,
     and reject the "want" otherwise.

  2. We start pack-objects on it, and the first thing it will do is
     collect the full set of objects to send. If it doesn't have them,
     it will barf.

Probably (1) would produce nicer error messages, but (2) is a lot more
efficient.

I dunno. I am not volunteering to implement, so I will leave thinking on
it more to somebody who wants to do so. :)

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-12-02 19:50 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-02  2:17 http-protocol question Bryan Turner
2014-12-02  3:34 ` Jonathan Nieder
2014-12-02  4:28   ` Bryan Turner
2014-12-02  4:29     ` Bryan Turner
2014-12-02  4:45     ` Jonathan Nieder
2014-12-02  5:04       ` Bryan Turner
2014-12-02  5:17         ` Jonathan Nieder
2014-12-02  5:30           ` Duy Nguyen
2014-12-02  5:37           ` Duy Nguyen
2014-12-02  5:33         ` Jeff King
2014-12-02  5:47           ` Bryan Turner
2014-12-02  5:52             ` Jeff King
2014-12-02 17:45           ` Junio C Hamano
2014-12-02 19:50             ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).