git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] use lock token in non-URI form in start_put
@ 2009-02-07 19:27 Tay Ray Chuan
  2009-02-07 20:20 ` Johannes Schindelin
  0 siblings, 1 reply; 7+ messages in thread
From: Tay Ray Chuan @ 2009-02-07 19:27 UTC (permalink / raw)
  To: git

After 753bc91 ("Remove the requirement opaquelocktoken uri scheme"),
lock tokens are in the URI forms in which they are received from the
server, eg. 'opaquelocktoken:', 'uuid:'

However, "start_put" (and consequently "start_move"), which attempts to
create a unique temporary file using the UUID of the lock token,
inadvertently uses the lock token in its URI form. These file
operations on the server may not be successful (specifically, in
Windows), due to the colon ':' character from the URI form of the lock
token in the file path.

This patch ensures that the lock token sans the URI scheme is used
instead in "start_put".

To do this, the "start_put" gets the position of ':', which is used to
separate the URI scheme from the part, eg. "<scheme>:". In addition,
start_put uses the last position of ':', since URIs with component
URIs are possible, eg. "urn:uuid:" One can be sure that the lock token
will always contain the UUID and be in URI form, due to RFC 2518, or
its successor RFC 4918 (see
http://www.webdav.org/specs/rfc4918.html#ELEMENT_locktoken).

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---
 http-push.c          |    2 +-
 t/t5540-http-push.sh |    7 +++++++
 2 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/http-push.c b/http-push.c
index eefd64c..bd8f372 100644
--- a/http-push.c
+++ b/http-push.c
@@ -558,7 +558,7 @@ static void start_put(struct transfer_request *request)

 	append_remote_object_url(&buf, remote->url, hex, 0);
 	strbuf_addstr(&buf, "_");
-	strbuf_addstr(&buf, request->lock->token);
+	strbuf_addstr(&buf, strrchr(request->lock->token, ':') + 1);
 	request->url = strbuf_detach(&buf, NULL);

 	slot = get_active_slot();
diff --git a/t/t5540-http-push.sh b/t/t5540-http-push.sh
index c236b5e..268b2d4 100755
--- a/t/t5540-http-push.sh
+++ b/t/t5540-http-push.sh
@@ -94,6 +94,13 @@ test_expect_success 'MKCOL sends directory names with trailing slashes' '

 '

+test_expect_success 'PUT and MOVE sends object to URLs in non-URI form' '
+
+	grep -P "\"(?:PUT|MOVE) .+objects/[\da-z]{2}/[\da-z]{38}_[\da-z\-]{36} HTTP/[0-9.]+\" 20\d" \
+	< "$HTTPD_ROOT_PATH"/access.log
+
+'
+
 stop_httpd

 test_done
-- 
1.6.1.2.278.g9a9e.dirty

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] use lock token in non-URI form in start_put
  2009-02-07 19:27 [PATCH] use lock token in non-URI form in start_put Tay Ray Chuan
@ 2009-02-07 20:20 ` Johannes Schindelin
  2009-02-07 20:40   ` Junio C Hamano
  2009-02-08  1:25   ` Tay Ray Chuan
  0 siblings, 2 replies; 7+ messages in thread
From: Johannes Schindelin @ 2009-02-07 20:20 UTC (permalink / raw)
  To: Tay Ray Chuan; +Cc: git

Hi,

On Sun, 8 Feb 2009, Tay Ray Chuan wrote:

> After 753bc91 ("Remove the requirement opaquelocktoken uri scheme"),
> lock tokens are in the URI forms in which they are received from the
> server, eg. 'opaquelocktoken:', 'uuid:'
>
> However, "start_put" (and consequently "start_move"), which attempts to
> create a unique temporary file using the UUID of the lock token,
> inadvertently uses the lock token in its URI form. These file
> operations on the server may not be successful (specifically, in
> Windows), due to the colon ':' character from the URI form of the lock
> token in the file path.

If it is a prefix that happens to be part of the URI, but must not be used 
by the client code as a lock token, would it not be better to store the 
token in lock->token to begin with?

> To do this, the "start_put" gets the position of ':', which is used to
> separate the URI scheme from the part, eg. "<scheme>:". In addition,
> start_put uses the last position of ':', since URIs with component
> URIs are possible, eg. "urn:uuid:" One can be sure that the lock token
> will always contain the UUID and be in URI form, due to RFC 2518, or
> its successor RFC 4918 (see
> http://www.webdav.org/specs/rfc4918.html#ELEMENT_locktoken).
> 
> Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
> 
> Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
> ---
>  http-push.c          |    2 +-
>  t/t5540-http-push.sh |    7 +++++++
>  2 files changed, 8 insertions(+), 1 deletions(-)
> 
> diff --git a/http-push.c b/http-push.c
> index eefd64c..bd8f372 100644
> --- a/http-push.c
> +++ b/http-push.c
> @@ -558,7 +558,7 @@ static void start_put(struct transfer_request *request)
> 
>  	append_remote_object_url(&buf, remote->url, hex, 0);
>  	strbuf_addstr(&buf, "_");
> -	strbuf_addstr(&buf, request->lock->token);
> +	strbuf_addstr(&buf, strrchr(request->lock->token, ':') + 1);

This is unsafe.  What if lock->token does not contain a colon?  Even if it 
happens to be the case now, in your setup, it might change, or there might 
be mistakes in the server code.  We should always play it safe if we 
cannot control the other side's code.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] use lock token in non-URI form in start_put
  2009-02-07 20:20 ` Johannes Schindelin
@ 2009-02-07 20:40   ` Junio C Hamano
  2009-02-08  1:45     ` Tay Ray Chuan
  2009-02-08  1:25   ` Tay Ray Chuan
  1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2009-02-07 20:40 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Tay Ray Chuan, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Sun, 8 Feb 2009, Tay Ray Chuan wrote:
>
>> After 753bc91 ("Remove the requirement opaquelocktoken uri scheme"),
>> lock tokens are in the URI forms in which they are received from the
>> server, eg. 'opaquelocktoken:', 'uuid:'
>>
>> However, "start_put" (and consequently "start_move"), which attempts to
>> create a unique temporary file using the UUID of the lock token,
>> inadvertently uses the lock token in its URI form. These file
>> operations on the server may not be successful (specifically, in
>> Windows), due to the colon ':' character from the URI form of the lock
>> token in the file path.
>
> If it is a prefix that happens to be part of the URI, but must not be used 
> by the client code as a lock token, would it not be better to store the 
> token in lock->token to begin with?

Let me show more of my ignorance around this codepath.

What is the real purpose of this appending?  My understanding is that it
is to ensure that a tentative PUT goes to a new file, so that a DAV server
whose PUT is not atomic (i.e. can leave a half-written bogosity when the
operation fails for whatever reason) won't leave a broken object in its
object store.  We MOVE it to its final destination and expect that to be
atomic.

Does the server side guarantee that the lock token string is unique in the
sense that it does not reuse a token that was used for a recent
transaction that was aborted?  If there is no such guarantee, then using
(a part of) the lock token as the string we append is no better than using
a random string we choose ourselves.

We may need to send the exact lock token back for unlocking the
transaction we started, but I do not think it necessarily is a good idea
to tie that to the random string we would use for PUT-then-MOVE operation.

RFC 4918 (section 6.5) explicitly states that the servers are free to use
any URI so long as it meets the uniqueness requirements, so relying on it
being any form of uuid does not sound like a good idea.  It can contain
not just a colon but other potentially problematic characters, such as a
slash, no?

That is why I asked in my previous question what in the codepath protects
ourselves from using problematic characters.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] use lock token in non-URI form in start_put
  2009-02-07 20:20 ` Johannes Schindelin
  2009-02-07 20:40   ` Junio C Hamano
@ 2009-02-08  1:25   ` Tay Ray Chuan
  1 sibling, 0 replies; 7+ messages in thread
From: Tay Ray Chuan @ 2009-02-08  1:25 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Hi,

On Sun, Feb 8, 2009 at 4:20 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> If it is a prefix that happens to be part of the URI, but must not be used
> by the client code as a lock token, would it not be better to store the
> token in lock->token to begin with?

The URI form of the lock token is suitable for all other occurrences
lock token usage; that is, only start_put needs the non-URI form,
while the rest use the URI form.

That's why I only changed how start_put uses the lock token; changing
lock token just for the sake of start_put doesn't seem very effective
to me.

> This is unsafe.  What if lock->token does not contain a colon?  Even if it
> happens to be the case now, in your setup, it might change, or there might
> be mistakes in the server code.  We should always play it safe if we
> cannot control the other side's code.

Point noted. I'll try to think of something else.

-- 
Cheers,
Ray Chuan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] use lock token in non-URI form in start_put
  2009-02-07 20:40   ` Junio C Hamano
@ 2009-02-08  1:45     ` Tay Ray Chuan
  2009-02-08  2:28       ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Tay Ray Chuan @ 2009-02-08  1:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git

Hi,

On Sun, Feb 8, 2009 at 4:40 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Let me show more of my ignorance around this codepath.
>
> What is the real purpose of this appending?  My understanding is that it
> is to ensure that a tentative PUT goes to a new file, so that a DAV server
> whose PUT is not atomic (i.e. can leave a half-written bogosity when the
> operation fails for whatever reason) won't leave a broken object in its
> object store.  We MOVE it to its final destination and expect that to be
> atomic.

This happens to be my understanding as well.

> Does the server side guarantee that the lock token string is unique in the
> sense that it does not reuse a token that was used for a recent
> transaction that was aborted?  If there is no such guarantee, then using
> (a part of) the lock token as the string we append is no better than using
> a random string we choose ourselves.

In section 6.5 which you cite below, the token is unique, and we hold
the server's word for it.

> We may need to send the exact lock token back for unlocking the
> transaction we started, but I do not think it necessarily is a good idea
> to tie that to the random string we would use for PUT-then-MOVE operation.
>
> RFC 4918 (section 6.5) explicitly states that the servers are free to use
> any URI so long as it meets the uniqueness requirements, so relying on it
> being any form of uuid does not sound like a good idea.  It can contain
> not just a colon but other potentially problematic characters, such as a
> slash, no?
>
> That is why I asked in my previous question what in the codepath protects
> ourselves from using problematic characters.
>

You're right, section 6.5 doesn't enforce that lock tokens are UUIDs.

Any solutions to this? Perhaps one could have a function, say,
get_unique_remote_postfix, and the function would check for URI
schemes which we know are safe for use in file names, namely,
"urn:uuid:" and "opaqulocktoken:". However, if its a URI we are unsure
of, then it would generate a random string.

-- 
Cheers,
Ray Chuan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] use lock token in non-URI form in start_put
  2009-02-08  1:45     ` Tay Ray Chuan
@ 2009-02-08  2:28       ` Junio C Hamano
  2009-02-08  3:28         ` Tay Ray Chuan
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2009-02-08  2:28 UTC (permalink / raw)
  To: Tay Ray Chuan; +Cc: Johannes Schindelin, git

Tay Ray Chuan <rctay89@gmail.com> writes:

> In section 6.5 which you cite below, the token is unique, and we hold
> the server's word for it.

Are you sure you are not reading too much into that guarantee?

I somehow thought that the natural reading of the guarantee is *not* that
the tokens are unique over the lifetime of the server installation (iow, a
lock token you obtained today will never be used in the future, and it is
a token that the server never has used before), but it merely guarantees
that there are no any two outstanding locks that share the same URI, lest
one client's unlock request breaks the wrong lock.  But I may be wrong here.

>> That is why I asked in my previous question what in the codepath protects
>> ourselves from using problematic characters.
>
> You're right, section 6.5 doesn't enforce that lock tokens are UUIDs.
>
> Any solutions to this? Perhaps one could have a function, say,
> get_unique_remote_postfix, and the function would check for URI
> schemes which we know are safe for use in file names, namely,
> "urn:uuid:" and "opaqulocktoken:". However, if its a URI we are unsure
> of, then it would generate a random string.

Let me show even more ignorance of mine with this codepath.

What breaks and how, if we do not even use a random string but a fixed
suffix ".temp" here?  I am not suggesting we actually do that, but I'd
like to see how important the uniqueness is here to better understand the
issue.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] use lock token in non-URI form in start_put
  2009-02-08  2:28       ` Junio C Hamano
@ 2009-02-08  3:28         ` Tay Ray Chuan
  0 siblings, 0 replies; 7+ messages in thread
From: Tay Ray Chuan @ 2009-02-08  3:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git

Hi,

On Sun, Feb 8, 2009 at 10:28 AM, Junio C Hamano <gitster@pobox.com> wrote:
> I somehow thought that the natural reading of the guarantee is *not* that
> the tokens are unique over the lifetime of the server installation (iow, a
> lock token you obtained today will never be used in the future, and it is
> a token that the server never has used before), but it merely guarantees
> that there are no any two outstanding locks that share the same URI, lest
> one client's unlock request breaks the wrong lock.  But I may be wrong here.
>
> ...
>
> What breaks and how, if we do not even use a random string but a fixed
> suffix ".temp" here?  I am not suggesting we actually do that, but I'd
> like to see how important the uniqueness is here to better understand the
> issue.

Having a temporary file dually ensures 1) no repository corruption 2)
no pushing simultaneously to a object file. I think an example of a
racy situation would help illustrate why a unique temporary file is
needed (pun not intended).

Imagine if two git clients, A and B, are trying to push to the
repository at the same time. Let's assume that A is our "official"
implementation of http-push, while B is a rogue, forked implementation
of A. B is rogue because, unlike A, it doesn't care about locks on
info/refs, refs/head/<branch>, etc. -- stuff that A checks for before
beginning pushing objects. But apart from that, A and B are similar in
all other aspects.

It so happens that they're pushing the same object, say, X, to the
same repository. A starts first, and during the period of the transfer
of object X, B starts too.

If we had used the ".temp" as you had suggested, we would have some
problems due to unexpected behaviour as a result of A's and B's
simultaneous writing to the file X.temp.

If we had used the token instead for the temporary file name, the
chances of this occurring is lowered, but still possible. That's
because we're using a token from a lock somewhat arbitrarily. For
example, if A was pushing to the branch "branch_A", it would lock
refs/heads/branch_A and use that lock token, and if B was pushing to
"branch_B", it would use the lock token obtained from locking
refs/heads/branch_B.

Going back to your first point on the uniqueness of the lock token,
one only needs a token that's unique for the locks currently in
effect. If the server had given us a token that had been used before,
but isn't used by anyone else *now*, then it's ok, since we wouldn't
be overwriting anybody else's file.

Thus, a token that's unique *now* is sufficient.

-- 
Cheers,
Ray Chuan

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-02-08  3:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-07 19:27 [PATCH] use lock token in non-URI form in start_put Tay Ray Chuan
2009-02-07 20:20 ` Johannes Schindelin
2009-02-07 20:40   ` Junio C Hamano
2009-02-08  1:45     ` Tay Ray Chuan
2009-02-08  2:28       ` Junio C Hamano
2009-02-08  3:28         ` Tay Ray Chuan
2009-02-08  1:25   ` Tay Ray Chuan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).