xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] oxenstored: fix short-write issue
@ 2015-10-27 17:10 Wei Liu
  2015-10-27 17:21 ` Samuel Thibault
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Wei Liu @ 2015-10-27 17:10 UTC (permalink / raw)
  To: Xen-devel
  Cc: Ian Jackson, Samuel Thibault, Wei Liu, Ian Campbell, David Scott

When oxenstored wrote to the ring, it wrote a chunk of contiguous data.
Originally when it tried to write across ring boundary, it returned a
short-write when there is still room.  That led to stalling mini-os's
xenstore thread at times.

Fix this by calling write function for a second time when the first
write completes partially.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
Cc: David Scott <dave@recoil.org>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>

Ian, backport candidate for as far as you can manage.
---
 tools/ocaml/libs/xb/xb.ml | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
index 50944b5..0730d13 100644
--- a/tools/ocaml/libs/xb/xb.ml
+++ b/tools/ocaml/libs/xb/xb.ml
@@ -91,10 +91,12 @@ let write_fd back con s len =
 	Unix.write back.fd s 0 len
 
 let write_mmap back con s len =
-	let ws = Xs_ring.write back.mmap s len in
-	if ws > 0 then
+	let ws = ref (Xs_ring.write back.mmap s len) in
+	if !ws < len then
+		ws := !ws + Xs_ring.write back.mmap (String.sub s !ws (len - !ws)) (len - !ws);
+	if !ws > 0 then
 		back.eventchn_notify ();
-	ws
+	!ws
 
 let write con s len =
 	match con.backend with
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] oxenstored: fix short-write issue
  2015-10-27 17:10 [PATCH] oxenstored: fix short-write issue Wei Liu
@ 2015-10-27 17:21 ` Samuel Thibault
  2015-10-27 17:21 ` Andrew Cooper
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Samuel Thibault @ 2015-10-27 17:21 UTC (permalink / raw)
  To: Wei Liu; +Cc: Xen-devel, Ian Jackson, Ian Campbell, David Scott

That being said,

Wei Liu, le Tue 27 Oct 2015 17:10:09 +0000, a écrit :
> @@ -91,10 +91,12 @@ let write_fd back con s len =
>  	Unix.write back.fd s 0 len
>  
>  let write_mmap back con s len =
> -	let ws = Xs_ring.write back.mmap s len in
> -	if ws > 0 then
> +	let ws = ref (Xs_ring.write back.mmap s len) in
> +	if !ws < len then

This could also be

   	if !ws > 0 && !ws < len then

And also perhaps add a comment:

        (* Xs_ring.write returned a short-write.  Perhaps it just hit
           the ring boundary and there is actually still room. Call it a
           second time to try to finish writing at the beginning of the
	   ring. *)

> +		ws := !ws + Xs_ring.write back.mmap (String.sub s !ws (len - !ws)) (len - !ws);
> +	if !ws > 0 then
>  		back.eventchn_notify ();
> -	ws
> +	!ws
>  
>  let write con s len =
>  	match con.backend with
> -- 
> 2.1.4
> 

-- 
Samuel
<N> un driver qui fait quoi, alors ?
<y> ben pour les bips
<s> pour passer les oops en morse
 -+- #ens-mim - vive les rapports de bug -+-

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] oxenstored: fix short-write issue
  2015-10-27 17:10 [PATCH] oxenstored: fix short-write issue Wei Liu
  2015-10-27 17:21 ` Samuel Thibault
@ 2015-10-27 17:21 ` Andrew Cooper
  2015-10-27 17:26   ` Samuel Thibault
  2015-10-27 17:28   ` Samuel Thibault
  2015-10-28 14:04 ` David Vrabel
  2015-11-02 13:44 ` Ian Campbell
  3 siblings, 2 replies; 15+ messages in thread
From: Andrew Cooper @ 2015-10-27 17:21 UTC (permalink / raw)
  To: Wei Liu, Xen-devel
  Cc: Samuel Thibault, Ian Jackson, Ian Campbell, David Scott

On 27/10/15 17:10, Wei Liu wrote:
> When oxenstored wrote to the ring, it wrote a chunk of contiguous data.
> Originally when it tried to write across ring boundary, it returned a
> short-write when there is still room.  That led to stalling mini-os's
> xenstore thread at times.
>
> Fix this by calling write function for a second time when the first
> write completes partially.
>
> Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
> Cc: David Scott <dave@recoil.org>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>
> Ian, backport candidate for as far as you can manage.
> ---
>  tools/ocaml/libs/xb/xb.ml | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
> index 50944b5..0730d13 100644
> --- a/tools/ocaml/libs/xb/xb.ml
> +++ b/tools/ocaml/libs/xb/xb.ml
> @@ -91,10 +91,12 @@ let write_fd back con s len =
>  	Unix.write back.fd s 0 len
>  
>  let write_mmap back con s len =
> -	let ws = Xs_ring.write back.mmap s len in
> -	if ws > 0 then
> +	let ws = ref (Xs_ring.write back.mmap s len) in
> +	if !ws < len then
> +		ws := !ws + Xs_ring.write back.mmap (String.sub s !ws (len - !ws)) (len - !ws);

This is surely a TOCTOU, as the second attempted write could return
short as well.

The correct behaviour would be for Xs_ring.write to return the actual
number of bytes it put into the ring, even if this is shorter than len.

~Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] oxenstored: fix short-write issue
  2015-10-27 17:21 ` Andrew Cooper
@ 2015-10-27 17:26   ` Samuel Thibault
  2015-10-27 17:28   ` Samuel Thibault
  1 sibling, 0 replies; 15+ messages in thread
From: Samuel Thibault @ 2015-10-27 17:26 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, David Scott, Wei Liu, Ian Jackson, Ian Campbell

Andrew Cooper, le Tue 27 Oct 2015 17:21:39 +0000, a écrit :
> The correct behaviour would be for Xs_ring.write to return the actual
> number of bytes it put into the ring, even if this is shorter than len.

That is already what it does.

Samuel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] oxenstored: fix short-write issue
  2015-10-27 17:21 ` Andrew Cooper
  2015-10-27 17:26   ` Samuel Thibault
@ 2015-10-27 17:28   ` Samuel Thibault
  2015-10-27 17:31     ` Samuel Thibault
  2015-10-27 17:31     ` Andrew Cooper
  1 sibling, 2 replies; 15+ messages in thread
From: Samuel Thibault @ 2015-10-27 17:28 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, David Scott, Wei Liu, Ian Jackson, Ian Campbell

Andrew Cooper, le Tue 27 Oct 2015 17:21:39 +0000, a écrit :
> as the second attempted write could return short as well.

That is fine. The second attempt will only return a short write if there
was really not enough room for it, which is what we want.

Samuel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] oxenstored: fix short-write issue
  2015-10-27 17:28   ` Samuel Thibault
@ 2015-10-27 17:31     ` Samuel Thibault
  2015-10-27 17:31     ` Andrew Cooper
  1 sibling, 0 replies; 15+ messages in thread
From: Samuel Thibault @ 2015-10-27 17:31 UTC (permalink / raw)
  To: Andrew Cooper, Wei Liu, Xen-devel, Ian Jackson, Ian Campbell,
	David Scott

Another way would be to make ml_interface_write write both pieces
instead of just a contiguous one.  The caml code would then look less
suspicious :)

Samuel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] oxenstored: fix short-write issue
  2015-10-27 17:28   ` Samuel Thibault
  2015-10-27 17:31     ` Samuel Thibault
@ 2015-10-27 17:31     ` Andrew Cooper
  2015-10-28 13:34       ` David Scott
  1 sibling, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2015-10-27 17:31 UTC (permalink / raw)
  To: Samuel Thibault, Wei Liu, Xen-devel, Ian Jackson, Ian Campbell,
	David Scott

On 27/10/15 17:28, Samuel Thibault wrote:
> Andrew Cooper, le Tue 27 Oct 2015 17:21:39 +0000, a écrit :
>> as the second attempted write could return short as well.
> That is fine. The second attempt will only return a short write if there
> was really not enough room for it, which is what we want.

Then surely the bug is that Xs_ring.write returns short when it shouldn't?

~Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] oxenstored: fix short-write issue
  2015-10-27 17:31     ` Andrew Cooper
@ 2015-10-28 13:34       ` David Scott
  2015-10-28 13:47         ` Samuel Thibault
  2015-10-28 14:00         ` Andrew Cooper
  0 siblings, 2 replies; 15+ messages in thread
From: David Scott @ 2015-10-28 13:34 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Samuel Thibault, Ian Campbell, Wei Liu, Ian Jackson, Xen-devel



> On 27 Oct 2015, at 17:31, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> 
> On 27/10/15 17:28, Samuel Thibault wrote:
>> Andrew Cooper, le Tue 27 Oct 2015 17:21:39 +0000, a écrit :
>>> as the second attempted write could return short as well.
>> That is fine. The second attempt will only return a short write if there
>> was really not enough room for it, which is what we want.
> 
> Then surely the bug is that Xs_ring.write returns short when it shouldn’t?


> On 27 Oct 2015, at 17:31, Samuel Thibault <samuel.thibault@ens-lyon.org> wrote:
> 
> Another way would be to make ml_interface_write write both pieces
> instead of just a contiguous one.  The caml code would then look less
> suspicious :)
> 

I agree with both Andrew and Samuel that it would be a better fix if Xs_ring.write (== “ml_interface_write”) wrote both pieces at once.

However I believe the ‘suspicious’ OCaml patch fixes the specific issue (— or have I missed something?).

Last time I meddled with the C-level ring reading/writing code I didn’t get it quite right. Does anyone have time to prototype what a C-level fix would look like? If we’re short of time I could live with the OCaml-level fix, especially since Wei has done some stress-testing (assuming everyone believes it fixes the issue)

Cheers,
Dave

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] oxenstored: fix short-write issue
  2015-10-28 13:34       ` David Scott
@ 2015-10-28 13:47         ` Samuel Thibault
  2015-10-28 14:00         ` Andrew Cooper
  1 sibling, 0 replies; 15+ messages in thread
From: Samuel Thibault @ 2015-10-28 13:47 UTC (permalink / raw)
  To: David Scott; +Cc: Andrew Cooper, Ian Campbell, Wei Liu, Ian Jackson, Xen-devel

[-- Attachment #1: Type: text/plain, Size: 2121 bytes --]

David Scott, le Wed 28 Oct 2015 13:34:10 +0000, a écrit :
> However I believe the ‘suspicious’ OCaml patch fixes the specific issue (— or have I missed something?).

It does.

> Does anyone have time to prototype what a C-level fix would look like?

Here is a try, untested though.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>

diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c b/tools/ocaml/libs/xb/xs_ring_stubs.c
index fd561a2..5c771d6 100644
--- a/tools/ocaml/libs/xb/xs_ring_stubs.c
+++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
@@ -99,12 +99,12 @@ CAMLprim value ml_interface_write(value ml_interface,
 	int result;

 	struct xenstore_domain_interface *intf = interface->addr;
-	XENSTORE_RING_IDX cons, prod;
+	XENSTORE_RING_IDX cons, old_prod, prod;
 	int can_write;
 	uint32_t connection;

 	cons = *(volatile uint32_t*)&intf->rsp_cons;
-	prod = *(volatile uint32_t*)&intf->rsp_prod;
+	old_prod = prod = *(volatile uint32_t*)&intf->rsp_prod;
 	connection = *(volatile uint32_t*)&intf->connection;

 	if (connection != XENSTORE_CONNECTED)
@@ -116,15 +116,30 @@ CAMLprim value ml_interface_write(value ml_interface,
 		goto exit;
 	}
 	if (MASK_XENSTORE_IDX(prod) >= MASK_XENSTORE_IDX(cons))
+	{
 		can_write = XENSTORE_RING_SIZE - MASK_XENSTORE_IDX(prod);
-	else
+		if (can_write < len)
+		{
+			/* Not enough room at end of ring.  Copy what we can first.  */
+			memcpy(intf->rsp + MASK_XENSTORE_IDX(prod), buffer, can_write);
+			prod += can_write;
+			len -= can_write;
+			buffer += can_write;
+
+			/* We are now back at beginning of ring.  */
+			can_write = MASK_XENSTORE_IDX(cons) - MASK_XENSTORE_IDX(prod);
+		}
+	}
+	else
 		can_write = MASK_XENSTORE_IDX(cons) - MASK_XENSTORE_IDX(prod);
 	if (can_write < len)
+		/* Real short write: not enough room.  */
 		len = can_write;
 	memcpy(intf->rsp + MASK_XENSTORE_IDX(prod), buffer, len);
+	prod += len;
 	xen_mb();
-	intf->rsp_prod += len;
-	result = len;
+	intf->rsp_prod = prod;
+	result = prod - old_prod;
 exit:
 	ml_result = Val_int(result);
 	CAMLreturn(ml_result);

[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 1708 bytes --]

diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c b/tools/ocaml/libs/xb/xs_ring_stubs.c
index fd561a2..5c771d6 100644
--- a/tools/ocaml/libs/xb/xs_ring_stubs.c
+++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
@@ -99,12 +99,12 @@ CAMLprim value ml_interface_write(value ml_interface,
 	int result;
 
 	struct xenstore_domain_interface *intf = interface->addr;
-	XENSTORE_RING_IDX cons, prod;
+	XENSTORE_RING_IDX cons, old_prod, prod;
 	int can_write;
 	uint32_t connection;
 
 	cons = *(volatile uint32_t*)&intf->rsp_cons;
-	prod = *(volatile uint32_t*)&intf->rsp_prod;
+	old_prod = prod = *(volatile uint32_t*)&intf->rsp_prod;
 	connection = *(volatile uint32_t*)&intf->connection;
 
 	if (connection != XENSTORE_CONNECTED)
@@ -116,15 +116,30 @@ CAMLprim value ml_interface_write(value ml_interface,
 		goto exit;
 	}
 	if (MASK_XENSTORE_IDX(prod) >= MASK_XENSTORE_IDX(cons))
+	{
 		can_write = XENSTORE_RING_SIZE - MASK_XENSTORE_IDX(prod);
-	else 
+		if (can_write < len)
+		{
+			/* Not enough room at end of ring.  Copy what we can first.  */
+			memcpy(intf->rsp + MASK_XENSTORE_IDX(prod), buffer, can_write);
+			prod += can_write;
+			len -= can_write;
+			buffer += can_write;
+
+			/* We are now back at beginning of ring.  */
+			can_write = MASK_XENSTORE_IDX(cons) - MASK_XENSTORE_IDX(prod);
+		}
+	}
+	else
 		can_write = MASK_XENSTORE_IDX(cons) - MASK_XENSTORE_IDX(prod);
 	if (can_write < len)
+		/* Real short write: not enough room.  */
 		len = can_write;
 	memcpy(intf->rsp + MASK_XENSTORE_IDX(prod), buffer, len);
+	prod += len;
 	xen_mb();
-	intf->rsp_prod += len;
-	result = len;
+	intf->rsp_prod = prod;
+	result = prod - old_prod;
 exit:
 	ml_result = Val_int(result);
 	CAMLreturn(ml_result);

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] oxenstored: fix short-write issue
  2015-10-28 13:34       ` David Scott
  2015-10-28 13:47         ` Samuel Thibault
@ 2015-10-28 14:00         ` Andrew Cooper
  1 sibling, 0 replies; 15+ messages in thread
From: Andrew Cooper @ 2015-10-28 14:00 UTC (permalink / raw)
  To: David Scott
  Cc: Ian Jackson, Samuel Thibault, Wei Liu, Ian Campbell, Xen-devel

On 28/10/15 13:34, David Scott wrote:
>
>> On 27 Oct 2015, at 17:31, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>
>> On 27/10/15 17:28, Samuel Thibault wrote:
>>> Andrew Cooper, le Tue 27 Oct 2015 17:21:39 +0000, a écrit :
>>>> as the second attempted write could return short as well.
>>> That is fine. The second attempt will only return a short write if there
>>> was really not enough room for it, which is what we want.
>> Then surely the bug is that Xs_ring.write returns short when it shouldn’t?
>
>> On 27 Oct 2015, at 17:31, Samuel Thibault <samuel.thibault@ens-lyon.org> wrote:
>>
>> Another way would be to make ml_interface_write write both pieces
>> instead of just a contiguous one.  The caml code would then look less
>> suspicious :)
>>
> I agree with both Andrew and Samuel that it would be a better fix if Xs_ring.write (== “ml_interface_write”) wrote both pieces at once.
>
> However I believe the ‘suspicious’ OCaml patch fixes the specific issue (— or have I missed something?).

Actually the suspicious OCaml is still buggy in the case that there was
a genuine short write followed by a false short write from wrapping the
ring.

>
> Last time I meddled with the C-level ring reading/writing code I didn’t get it quite right. Does anyone have time to prototype what a C-level fix would look like? If we’re short of time I could live with the OCaml-level fix, especially since Wei has done some stress-testing (assuming everyone believes it fixes the issue)

There are several bugs in that function (the remaining size calculations
are plain wrong), and Samuels patch sadly doesn't address all of them. 

I will throw a different patch together addressing all the issues I can
spot.

~Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] oxenstored: fix short-write issue
  2015-10-27 17:10 [PATCH] oxenstored: fix short-write issue Wei Liu
  2015-10-27 17:21 ` Samuel Thibault
  2015-10-27 17:21 ` Andrew Cooper
@ 2015-10-28 14:04 ` David Vrabel
  2015-11-02 13:44 ` Ian Campbell
  3 siblings, 0 replies; 15+ messages in thread
From: David Vrabel @ 2015-10-28 14:04 UTC (permalink / raw)
  To: Wei Liu, Xen-devel
  Cc: Samuel Thibault, Ian Jackson, Ian Campbell, David Scott

On 27/10/15 17:10, Wei Liu wrote:
> When oxenstored wrote to the ring, it wrote a chunk of contiguous data.
> Originally when it tried to write across ring boundary, it returned a
> short-write when there is still room.  That led to stalling mini-os's
> xenstore thread at times.
> 
> Fix this by calling write function for a second time when the first
> write completes partially.

What happens if the 1st write is short because there is not enough
space, and the 2nd write is short because of the ring wraparound?  You
will still stall.

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] oxenstored: fix short-write issue
  2015-10-27 17:10 [PATCH] oxenstored: fix short-write issue Wei Liu
                   ` (2 preceding siblings ...)
  2015-10-28 14:04 ` David Vrabel
@ 2015-11-02 13:44 ` Ian Campbell
  2015-11-02 14:10   ` Andrew Cooper
  3 siblings, 1 reply; 15+ messages in thread
From: Ian Campbell @ 2015-11-02 13:44 UTC (permalink / raw)
  To: Wei Liu, Xen-devel; +Cc: Samuel Thibault, Ian Jackson, David Scott

On Tue, 2015-10-27 at 17:10 +0000, Wei Liu wrote:
> When oxenstored wrote to the ring, it wrote a chunk of contiguous data.
> Originally when it tried to write across ring boundary, it returned a
> short-write when there is still room.  That led to stalling mini-os's
> xenstore thread at times.

What is a "short-write" in this context?

Given data bytes 0..M I assumed it is only writing bytes 0..N and not
N+1..M because the ring boundary is at N. But what is it writing to the
->prod ring pointer N or M?

AIUI writing N should be allowed by the ring protocol, the client should
keep looking for more data until it has a complete request.

Writing M would be a server error.

Ian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] oxenstored: fix short-write issue
  2015-11-02 13:44 ` Ian Campbell
@ 2015-11-02 14:10   ` Andrew Cooper
  2015-11-02 14:24     ` Ian Campbell
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2015-11-02 14:10 UTC (permalink / raw)
  To: Ian Campbell, Wei Liu, Xen-devel
  Cc: Samuel Thibault, Ian Jackson, David Scott

On 02/11/15 13:44, Ian Campbell wrote:
> On Tue, 2015-10-27 at 17:10 +0000, Wei Liu wrote:
>> When oxenstored wrote to the ring, it wrote a chunk of contiguous data.
>> Originally when it tried to write across ring boundary, it returned a
>> short-write when there is still room.  That led to stalling mini-os's
>> xenstore thread at times.
> What is a "short-write" in this context?
>
> Given data bytes 0..M I assumed it is only writing bytes 0..N and not
> N+1..M because the ring boundary is at N. But what is it writing to the
> ->prod ring pointer N or M?

Prod gets incremented by N in this case.

>
> AIUI writing N should be allowed by the ring protocol, the client should
> keep looking for more data until it has a complete request.
>
> Writing M would be a server error.

Correct, and this is what is happening.

The server (believes) that the ring is full, when it is not.  It waits
for the client to make more space in the ring, while the client is
waiting for the server to complete its message in the ring, thus stalling.

~Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] oxenstored: fix short-write issue
  2015-11-02 14:10   ` Andrew Cooper
@ 2015-11-02 14:24     ` Ian Campbell
  2015-11-02 14:27       ` Andrew Cooper
  0 siblings, 1 reply; 15+ messages in thread
From: Ian Campbell @ 2015-11-02 14:24 UTC (permalink / raw)
  To: Andrew Cooper, Wei Liu, Xen-devel
  Cc: Samuel Thibault, Ian Jackson, David Scott

On Mon, 2015-11-02 at 14:10 +0000, Andrew Cooper wrote:
> On 02/11/15 13:44, Ian Campbell wrote:
> > On Tue, 2015-10-27 at 17:10 +0000, Wei Liu wrote:
> > > When oxenstored wrote to the ring, it wrote a chunk of contiguous
> > > data.
> > > Originally when it tried to write across ring boundary, it returned a
> > > short-write when there is still room.  That led to stalling mini-os's
> > > xenstore thread at times.
> > What is a "short-write" in this context?
> > 
> > Given data bytes 0..M I assumed it is only writing bytes 0..N and not
> > N+1..M because the ring boundary is at N. But what is it writing to the
> > ->prod ring pointer N or M?
> 
> Prod gets incremented by N in this case.
> 
> > 
> > AIUI writing N should be allowed by the ring protocol, the client
> > should
> > keep looking for more data until it has a complete request.
> > 
> > Writing M would be a server error.
> 
> Correct, and this is what is happening.

The first or second? Your first comment suggests the first, but your second
binds most closely to the second.

> The server (believes) that the ring is full, when it is not.  It waits
> for the client to make more space in the ring, while the client is
> waiting for the server to complete its message in the ring, thus
> stalling.

That makes sense, thanks.

I think this needs to be spelled out more fully in the commit message, in
particular "server thinks ring is full when it is not" is the most relevant
thing, the short write is just how we arrived there.

Ian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] oxenstored: fix short-write issue
  2015-11-02 14:24     ` Ian Campbell
@ 2015-11-02 14:27       ` Andrew Cooper
  0 siblings, 0 replies; 15+ messages in thread
From: Andrew Cooper @ 2015-11-02 14:27 UTC (permalink / raw)
  To: Ian Campbell, Wei Liu, Xen-devel
  Cc: Samuel Thibault, Ian Jackson, David Scott

On 02/11/15 14:24, Ian Campbell wrote:
> On Mon, 2015-11-02 at 14:10 +0000, Andrew Cooper wrote:
>> On 02/11/15 13:44, Ian Campbell wrote:
>>> On Tue, 2015-10-27 at 17:10 +0000, Wei Liu wrote:
>>>> When oxenstored wrote to the ring, it wrote a chunk of contiguous
>>>> data.
>>>> Originally when it tried to write across ring boundary, it returned a
>>>> short-write when there is still room.  That led to stalling mini-os's
>>>> xenstore thread at times.
>>> What is a "short-write" in this context?
>>>
>>> Given data bytes 0..M I assumed it is only writing bytes 0..N and not
>>> N+1..M because the ring boundary is at N. But what is it writing to the
>>> ->prod ring pointer N or M?
>> Prod gets incremented by N in this case.
>>
>>> AIUI writing N should be allowed by the ring protocol, the client
>>> should
>>> keep looking for more data until it has a complete request.
>>>
>>> Writing M would be a server error.
>> Correct, and this is what is happening.
> The first or second? Your first comment suggests the first, but your second
> binds most closely to the second.

Oops yes.  Writing M would be an error.  Prod currently gets incremented
by N.

>
>> The server (believes) that the ring is full, when it is not.  It waits
>> for the client to make more space in the ring, while the client is
>> waiting for the server to complete its message in the ring, thus
>> stalling.
> That makes sense, thanks.
>
> I think this needs to be spelled out more fully in the commit message, in
> particular "server thinks ring is full when it is not" is the most relevant
> thing, the short write is just how we arrived there.

This patch here is buggy, and superseeded by one of mine which attempts
to fix the C stubs.

I need to post a v2.

~Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-11-02 14:27 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-27 17:10 [PATCH] oxenstored: fix short-write issue Wei Liu
2015-10-27 17:21 ` Samuel Thibault
2015-10-27 17:21 ` Andrew Cooper
2015-10-27 17:26   ` Samuel Thibault
2015-10-27 17:28   ` Samuel Thibault
2015-10-27 17:31     ` Samuel Thibault
2015-10-27 17:31     ` Andrew Cooper
2015-10-28 13:34       ` David Scott
2015-10-28 13:47         ` Samuel Thibault
2015-10-28 14:00         ` Andrew Cooper
2015-10-28 14:04 ` David Vrabel
2015-11-02 13:44 ` Ian Campbell
2015-11-02 14:10   ` Andrew Cooper
2015-11-02 14:24     ` Ian Campbell
2015-11-02 14:27       ` Andrew Cooper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).