[PATCH] xen-netback: count number required slots for an skb more carefully

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] xen-netback: count number required slots for an skb more carefully
@ 2013-09-03 17:29 David Vrabel
  2013-09-03 21:53 ` Wei Liu
  2013-09-04 18:48 ` David Miller
  0 siblings, 2 replies; 21+ messages in thread
From: David Vrabel @ 2013-09-03 17:29 UTC (permalink / raw)
  To: xen-devel
  Cc: David Vrabel, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	Ian Campbell, netdev

From: David Vrabel <david.vrabel@citrix.com>

When a VM is providing an iSCSI target and the LUN is used by the
backend domain, the generated skbs for direct I/O writes to the disk
have large, multi-page skb->data but no frags.

With some lengths and starting offsets, xen_netbk_count_skb_slots()
would be one short because the simple calculation of
DIV_ROUND_UP(skb_headlen(), PAGE_SIZE) was not accounting for the
decisions made by start_new_rx_buffer() which does not guarantee
responses are fully packed.

For example, a skb with length < 2 pages but which spans 3 pages would
be counted as requiring 2 slots but would actually use 3 slots.

skb->data:

    |        1111|222222222222|3333        |

Fully packed, this would need 2 slots:

    |111122222222|22223333    |

But because the 2nd page wholy fits into a slot it is not split across
slots and goes into a slot of its own:

    |1111        |222222222222|3333        |

Miscounting the number of slots means netback may push more responses
than the number of available requests.  This will cause the frontend
to get very confused and report "Too many frags/slots".  The frontend
never recovers and will eventually BUG.

Fix this by counting the number of required slots more carefully.  In
xen_netbk_count_skb_slots(), more closely follow the algorithm used by
xen_netbk_gop_skb() by introducing xen_netbk_count_frag_slots() which
is the dry-run equivalent of netbk_gop_frag_copy().

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
[Resend to actually Cc netdev, sorry.]
---
 drivers/net/xen-netback/netback.c |   94 +++++++++++++++++++++++++------------
 1 files changed, 64 insertions(+), 30 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 64828de..f149ce5 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -361,6 +361,49 @@ static bool start_new_rx_buffer(int offset, unsigned long size, int head)
 	return false;
 }
 
+struct xen_netbk_count_slot_state {
+	unsigned long copy_off;
+	bool head;
+};
+
+unsigned int xen_netbk_count_frag_slots(struct xenvif *vif,
+					unsigned long offset, unsigned long size,
+					struct xen_netbk_count_slot_state *state)
+{
+	unsigned count = 0;
+
+	offset &= ~PAGE_MASK;
+
+	while (size > 0) {
+		unsigned long bytes;
+
+		bytes = PAGE_SIZE - offset;
+
+		if (bytes > size)
+			bytes = size;
+
+		if (start_new_rx_buffer(state->copy_off, bytes, state->head)) {
+			count++;
+			state->copy_off = 0;
+		}
+
+		if (state->copy_off + bytes > MAX_BUFFER_OFFSET)
+			bytes = MAX_BUFFER_OFFSET - state->copy_off;
+
+		state->copy_off += bytes;
+
+		offset += bytes;
+		size -= bytes;
+
+		if (offset == PAGE_SIZE)
+			offset = 0;
+
+		state->head = false;
+	}
+
+	return count;
+}
+
 /*
  * Figure out how many ring slots we're going to need to send @skb to
  * the guest. This function is essentially a dry run of
@@ -368,48 +411,39 @@ static bool start_new_rx_buffer(int offset, unsigned long size, int head)
  */
 unsigned int xen_netbk_count_skb_slots(struct xenvif *vif, struct sk_buff *skb)
 {
+	struct xen_netbk_count_slot_state state;
 	unsigned int count;
-	int i, copy_off;
+	unsigned char *data;
+	unsigned i;
 
-	count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);
+	state.head = true;
+	state.copy_off = 0;
 
-	copy_off = skb_headlen(skb) % PAGE_SIZE;
+	/* Slot for the first (partial) page of data. */
+	count = 1;
 
+	/* Need a slot for the GSO prefix for GSO extra data? */
 	if (skb_shinfo(skb)->gso_size)
 		count++;
 
-	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
-		unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
-		unsigned long offset = skb_shinfo(skb)->frags[i].page_offset;
-		unsigned long bytes;
-
-		offset &= ~PAGE_MASK;
-
-		while (size > 0) {
-			BUG_ON(offset >= PAGE_SIZE);
-			BUG_ON(copy_off > MAX_BUFFER_OFFSET);
-
-			bytes = PAGE_SIZE - offset;
-
-			if (bytes > size)
-				bytes = size;
+	data = skb->data;
+	while (data < skb_tail_pointer(skb)) {
+		unsigned long offset = offset_in_page(data);
+		unsigned long size = PAGE_SIZE - offset;
 
-			if (start_new_rx_buffer(copy_off, bytes, 0)) {
-				count++;
-				copy_off = 0;
-			}
+		if (data + size > skb_tail_pointer(skb))
+			size = skb_tail_pointer(skb) - data;
 
-			if (copy_off + bytes > MAX_BUFFER_OFFSET)
-				bytes = MAX_BUFFER_OFFSET - copy_off;
+		count += xen_netbk_count_frag_slots(vif, offset, size, &state);
 
-			copy_off += bytes;
+		data += size;
+	}
 
-			offset += bytes;
-			size -= bytes;
+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+		unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
+		unsigned long offset = skb_shinfo(skb)->frags[i].page_offset;
 
-			if (offset == PAGE_SIZE)
-				offset = 0;
-		}
+		count += xen_netbk_count_frag_slots(vif, offset, size, &state);
 	}
 	return count;
 }
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] xen-netback: count number required slots for an skb more carefully
  2013-09-03 17:29 [PATCH] xen-netback: count number required slots for an skb more carefully David Vrabel
@ 2013-09-03 21:53 ` Wei Liu
  2013-09-04  2:25   ` annie li
  2013-09-04 11:48   ` [PATCH] " David Vrabel
  2013-09-04 18:48 ` David Miller
  1 sibling, 2 replies; 21+ messages in thread
From: Wei Liu @ 2013-09-03 21:53 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky, Ian Campbell,
	netdev, wei.liu2, msw, annie.li

On Tue, Sep 03, 2013 at 06:29:50PM +0100, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
> 
> When a VM is providing an iSCSI target and the LUN is used by the
> backend domain, the generated skbs for direct I/O writes to the disk
> have large, multi-page skb->data but no frags.
> 
> With some lengths and starting offsets, xen_netbk_count_skb_slots()
> would be one short because the simple calculation of
> DIV_ROUND_UP(skb_headlen(), PAGE_SIZE) was not accounting for the
> decisions made by start_new_rx_buffer() which does not guarantee
> responses are fully packed.
> 
> For example, a skb with length < 2 pages but which spans 3 pages would
> be counted as requiring 2 slots but would actually use 3 slots.
> 
> skb->data:
> 
>     |        1111|222222222222|3333        |
> 
> Fully packed, this would need 2 slots:
> 
>     |111122222222|22223333    |
> 
> But because the 2nd page wholy fits into a slot it is not split across
> slots and goes into a slot of its own:
> 
>     |1111        |222222222222|3333        |
> 
> Miscounting the number of slots means netback may push more responses
> than the number of available requests.  This will cause the frontend
> to get very confused and report "Too many frags/slots".  The frontend
> never recovers and will eventually BUG.
> 
> Fix this by counting the number of required slots more carefully.  In
> xen_netbk_count_skb_slots(), more closely follow the algorithm used by
> xen_netbk_gop_skb() by introducing xen_netbk_count_frag_slots() which
> is the dry-run equivalent of netbk_gop_frag_copy().
> 

Phew! So this is backend miscounting bug. I thought it was a frontend
bug so it didn't ring a bell when we had our face-to-face discussion,
sorry. :-(

This bug was discussed back in July among Annie, Matt, Ian and I. We
finally agreed to take Matt's solution. Matt agreed to post final
version within a week but obviously he's too busy to do so. I was away
so I didn't follow closely. Eventually it fell through the crack. :-(

Matt, do you fancy sending the final version? IIRC the commit message
needs to be re-written. I personally still prefer Matt's solution as
it a) make efficient use of the ring, b) uses ring pointers to
calculate slots which is most accurate, c) removes the dependence on
MAX_SKB_FRAGS in guest RX path.

Anyway, we should get this fixed ASAP.

Thanks
Wei.

REF:
<1373409659-22383-1-git-send-email-msw@amazon.com>
<1373350520-19985-1-git-send-email-annie.li@oracle.com>


> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
> [Resend to actually Cc netdev, sorry.]
> ---
>  drivers/net/xen-netback/netback.c |   94 +++++++++++++++++++++++++------------
>  1 files changed, 64 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 64828de..f149ce5 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -361,6 +361,49 @@ static bool start_new_rx_buffer(int offset, unsigned long size, int head)
>  	return false;
>  }
>  
> +struct xen_netbk_count_slot_state {
> +	unsigned long copy_off;
> +	bool head;
> +};
> +
> +unsigned int xen_netbk_count_frag_slots(struct xenvif *vif,
> +					unsigned long offset, unsigned long size,
> +					struct xen_netbk_count_slot_state *state)
> +{
> +	unsigned count = 0;
> +
> +	offset &= ~PAGE_MASK;
> +
> +	while (size > 0) {
> +		unsigned long bytes;
> +
> +		bytes = PAGE_SIZE - offset;
> +
> +		if (bytes > size)
> +			bytes = size;
> +
> +		if (start_new_rx_buffer(state->copy_off, bytes, state->head)) {
> +			count++;
> +			state->copy_off = 0;
> +		}
> +
> +		if (state->copy_off + bytes > MAX_BUFFER_OFFSET)
> +			bytes = MAX_BUFFER_OFFSET - state->copy_off;
> +
> +		state->copy_off += bytes;
> +
> +		offset += bytes;
> +		size -= bytes;
> +
> +		if (offset == PAGE_SIZE)
> +			offset = 0;
> +
> +		state->head = false;
> +	}
> +
> +	return count;
> +}
> +
>  /*
>   * Figure out how many ring slots we're going to need to send @skb to
>   * the guest. This function is essentially a dry run of
> @@ -368,48 +411,39 @@ static bool start_new_rx_buffer(int offset, unsigned long size, int head)
>   */
>  unsigned int xen_netbk_count_skb_slots(struct xenvif *vif, struct sk_buff *skb)
>  {
> +	struct xen_netbk_count_slot_state state;
>  	unsigned int count;
> -	int i, copy_off;
> +	unsigned char *data;
> +	unsigned i;
>  
> -	count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);
> +	state.head = true;
> +	state.copy_off = 0;
>  
> -	copy_off = skb_headlen(skb) % PAGE_SIZE;
> +	/* Slot for the first (partial) page of data. */
> +	count = 1;
>  
> +	/* Need a slot for the GSO prefix for GSO extra data? */
>  	if (skb_shinfo(skb)->gso_size)
>  		count++;
>  
> -	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
> -		unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
> -		unsigned long offset = skb_shinfo(skb)->frags[i].page_offset;
> -		unsigned long bytes;
> -
> -		offset &= ~PAGE_MASK;
> -
> -		while (size > 0) {
> -			BUG_ON(offset >= PAGE_SIZE);
> -			BUG_ON(copy_off > MAX_BUFFER_OFFSET);
> -
> -			bytes = PAGE_SIZE - offset;
> -
> -			if (bytes > size)
> -				bytes = size;
> +	data = skb->data;
> +	while (data < skb_tail_pointer(skb)) {
> +		unsigned long offset = offset_in_page(data);
> +		unsigned long size = PAGE_SIZE - offset;
>  
> -			if (start_new_rx_buffer(copy_off, bytes, 0)) {
> -				count++;
> -				copy_off = 0;
> -			}
> +		if (data + size > skb_tail_pointer(skb))
> +			size = skb_tail_pointer(skb) - data;
>  
> -			if (copy_off + bytes > MAX_BUFFER_OFFSET)
> -				bytes = MAX_BUFFER_OFFSET - copy_off;
> +		count += xen_netbk_count_frag_slots(vif, offset, size, &state);
>  
> -			copy_off += bytes;
> +		data += size;
> +	}
>  
> -			offset += bytes;
> -			size -= bytes;
> +	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
> +		unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
> +		unsigned long offset = skb_shinfo(skb)->frags[i].page_offset;
>  
> -			if (offset == PAGE_SIZE)
> -				offset = 0;
> -		}
> +		count += xen_netbk_count_frag_slots(vif, offset, size, &state);
>  	}
>  	return count;
>  }
> -- 
> 1.7.2.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] xen-netback: count number required slots for an skb more carefully
  2013-09-03 21:53 ` Wei Liu
@ 2013-09-04  2:25   ` annie li
  2013-09-04  6:04     ` Matt Wilson
  2013-09-04 11:48   ` [PATCH] " David Vrabel
  1 sibling, 1 reply; 21+ messages in thread
From: annie li @ 2013-09-04  2:25 UTC (permalink / raw)
  To: Wei Liu
  Cc: David Vrabel, xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	Ian Campbell, netdev, msw


On 2013-9-4 5:53, Wei Liu wrote:
> On Tue, Sep 03, 2013 at 06:29:50PM +0100, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> When a VM is providing an iSCSI target and the LUN is used by the
>> backend domain, the generated skbs for direct I/O writes to the disk
>> have large, multi-page skb->data but no frags.
>>
>> With some lengths and starting offsets, xen_netbk_count_skb_slots()
>> would be one short because the simple calculation of
>> DIV_ROUND_UP(skb_headlen(), PAGE_SIZE) was not accounting for the
>> decisions made by start_new_rx_buffer() which does not guarantee
>> responses are fully packed.
>>
>> For example, a skb with length < 2 pages but which spans 3 pages would
>> be counted as requiring 2 slots but would actually use 3 slots.
>>
>> skb->data:
>>
>>      |        1111|222222222222|3333        |
>>
>> Fully packed, this would need 2 slots:
>>
>>      |111122222222|22223333    |
>>
>> But because the 2nd page wholy fits into a slot it is not split across
>> slots and goes into a slot of its own:
>>
>>      |1111        |222222222222|3333        |
>>
>> Miscounting the number of slots means netback may push more responses
>> than the number of available requests.  This will cause the frontend
>> to get very confused and report "Too many frags/slots".  The frontend
>> never recovers and will eventually BUG.
>>
>> Fix this by counting the number of required slots more carefully.  In
>> xen_netbk_count_skb_slots(), more closely follow the algorithm used by
>> xen_netbk_gop_skb() by introducing xen_netbk_count_frag_slots() which
>> is the dry-run equivalent of netbk_gop_frag_copy().
>>
> Phew! So this is backend miscounting bug. I thought it was a frontend
> bug so it didn't ring a bell when we had our face-to-face discussion,
> sorry. :-(
>
> This bug was discussed back in July among Annie, Matt, Ian and I. We
> finally agreed to take Matt's solution. Matt agreed to post final
> version within a week but obviously he's too busy to do so. I was away
> so I didn't follow closely. Eventually it fell through the crack. :-(

The fixes can be implemented in two ways, one is fix in 
xen_netbk_count_skb_slots to return correct slot count, my 
patch(http://lists.xen.org/archives/html/xen-devel/2013-07/msg00785.html) and 
David's fall in this way. The other way is fixed in netbk_gop_frag_copy 
which is 
Matt's(http://lists.xen.org/archives/html/xen-devel/2013-07/msg00760.html).

> Matt, do you fancy sending the final version? IIRC the commit message
> needs to be re-written. I personally still prefer Matt's solution as
> it a) make efficient use of the ring, b) uses ring pointers to
> calculate slots which is most accurate, c) removes the dependence on
> MAX_SKB_FRAGS in guest RX path.
>
> Anyway, we should get this fixed ASAP.

Totally agree. This issue is easy to be reproduced with large MTU. It is 
better to upstream the fix soon in case others hit it and waste time to 
fix it.

Thanks
Annie
>
> Thanks
> Wei.
>
> REF:
> <1373409659-22383-1-git-send-email-msw@amazon.com>
> <1373350520-19985-1-git-send-email-annie.li@oracle.com>
>
>
>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>> ---
>> [Resend to actually Cc netdev, sorry.]
>> ---
>>   drivers/net/xen-netback/netback.c |   94 +++++++++++++++++++++++++------------
>>   1 files changed, 64 insertions(+), 30 deletions(-)
>>
>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
>> index 64828de..f149ce5 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -361,6 +361,49 @@ static bool start_new_rx_buffer(int offset, unsigned long size, int head)
>>   	return false;
>>   }
>>   
>> +struct xen_netbk_count_slot_state {
>> +	unsigned long copy_off;
>> +	bool head;
>> +};
>> +
>> +unsigned int xen_netbk_count_frag_slots(struct xenvif *vif,
>> +					unsigned long offset, unsigned long size,
>> +					struct xen_netbk_count_slot_state *state)
>> +{
>> +	unsigned count = 0;
>> +
>> +	offset &= ~PAGE_MASK;
>> +
>> +	while (size > 0) {
>> +		unsigned long bytes;
>> +
>> +		bytes = PAGE_SIZE - offset;
>> +
>> +		if (bytes > size)
>> +			bytes = size;
>> +
>> +		if (start_new_rx_buffer(state->copy_off, bytes, state->head)) {
>> +			count++;
>> +			state->copy_off = 0;
>> +		}
>> +
>> +		if (state->copy_off + bytes > MAX_BUFFER_OFFSET)
>> +			bytes = MAX_BUFFER_OFFSET - state->copy_off;
>> +
>> +		state->copy_off += bytes;
>> +
>> +		offset += bytes;
>> +		size -= bytes;
>> +
>> +		if (offset == PAGE_SIZE)
>> +			offset = 0;
>> +
>> +		state->head = false;
>> +	}
>> +
>> +	return count;
>> +}
>> +
>>   /*
>>    * Figure out how many ring slots we're going to need to send @skb to
>>    * the guest. This function is essentially a dry run of
>> @@ -368,48 +411,39 @@ static bool start_new_rx_buffer(int offset, unsigned long size, int head)
>>    */
>>   unsigned int xen_netbk_count_skb_slots(struct xenvif *vif, struct sk_buff *skb)
>>   {
>> +	struct xen_netbk_count_slot_state state;
>>   	unsigned int count;
>> -	int i, copy_off;
>> +	unsigned char *data;
>> +	unsigned i;
>>   
>> -	count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);
>> +	state.head = true;
>> +	state.copy_off = 0;
>>   
>> -	copy_off = skb_headlen(skb) % PAGE_SIZE;
>> +	/* Slot for the first (partial) page of data. */
>> +	count = 1;
>>   
>> +	/* Need a slot for the GSO prefix for GSO extra data? */
>>   	if (skb_shinfo(skb)->gso_size)
>>   		count++;
>>   
>> -	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
>> -		unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
>> -		unsigned long offset = skb_shinfo(skb)->frags[i].page_offset;
>> -		unsigned long bytes;
>> -
>> -		offset &= ~PAGE_MASK;
>> -
>> -		while (size > 0) {
>> -			BUG_ON(offset >= PAGE_SIZE);
>> -			BUG_ON(copy_off > MAX_BUFFER_OFFSET);
>> -
>> -			bytes = PAGE_SIZE - offset;
>> -
>> -			if (bytes > size)
>> -				bytes = size;
>> +	data = skb->data;
>> +	while (data < skb_tail_pointer(skb)) {
>> +		unsigned long offset = offset_in_page(data);
>> +		unsigned long size = PAGE_SIZE - offset;
>>   
>> -			if (start_new_rx_buffer(copy_off, bytes, 0)) {
>> -				count++;
>> -				copy_off = 0;
>> -			}
>> +		if (data + size > skb_tail_pointer(skb))
>> +			size = skb_tail_pointer(skb) - data;
>>   
>> -			if (copy_off + bytes > MAX_BUFFER_OFFSET)
>> -				bytes = MAX_BUFFER_OFFSET - copy_off;
>> +		count += xen_netbk_count_frag_slots(vif, offset, size, &state);
>>   
>> -			copy_off += bytes;
>> +		data += size;
>> +	}
>>   
>> -			offset += bytes;
>> -			size -= bytes;
>> +	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
>> +		unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
>> +		unsigned long offset = skb_shinfo(skb)->frags[i].page_offset;
>>   
>> -			if (offset == PAGE_SIZE)
>> -				offset = 0;
>> -		}
>> +		count += xen_netbk_count_frag_slots(vif, offset, size, &state);
>>   	}
>>   	return count;
>>   }
>> -- 
>> 1.7.2.5
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: xen-netback: count number required slots for an skb more carefully
  2013-09-04  2:25   ` annie li
@ 2013-09-04  6:04     ` Matt Wilson
  2013-09-04  6:56       ` [Xen-devel] " annie li
  2013-09-04  8:36       ` Ian Campbell
  0 siblings, 2 replies; 21+ messages in thread
From: Matt Wilson @ 2013-09-04  6:04 UTC (permalink / raw)
  To: annie li
  Cc: Wei Liu, David Vrabel, xen-devel, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, Ian Campbell, netdev, msw

On Wed, Sep 04, 2013 at 10:25:59AM +0800, annie li wrote:
> On 2013-9-4 5:53, Wei Liu wrote:
[...]
> >Matt, do you fancy sending the final version? IIRC the commit message
> >needs to be re-written. I personally still prefer Matt's solution as
> >it a) make efficient use of the ring, b) uses ring pointers to
> >calculate slots which is most accurate, c) removes the dependence on
> >MAX_SKB_FRAGS in guest RX path.
> >
> >Anyway, we should get this fixed ASAP.
> 
> Totally agree. This issue is easy to be reproduced with large MTU.
> It is better to upstream the fix soon in case others hit it and
> waste time to fix it.

I'd like to go with Xi's proposed patch that I posed earlier. The main
thing that's kept me from sending a final version is lack of time to
retest against a newer kernel.

Could someone help out with that? I probably can't get to it until the
end of the week.

Sorry for the delay. :-(

--msw

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] xen-netback: count number required slots for an skb more carefully
  2013-09-04  6:04     ` Matt Wilson
@ 2013-09-04  6:56       ` annie li
  2013-09-04  7:38         ` Wei Liu
  2013-09-04  8:36       ` Ian Campbell
  1 sibling, 1 reply; 21+ messages in thread
From: annie li @ 2013-09-04  6:56 UTC (permalink / raw)
  To: Matt Wilson
  Cc: Wei Liu, David Vrabel, xen-devel, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, Ian Campbell, netdev, msw


On 2013-9-4 14:04, Matt Wilson wrote:
> On Wed, Sep 04, 2013 at 10:25:59AM +0800, annie li wrote:
>> On 2013-9-4 5:53, Wei Liu wrote:
> [...]
>>> Matt, do you fancy sending the final version? IIRC the commit message
>>> needs to be re-written. I personally still prefer Matt's solution as
>>> it a) make efficient use of the ring, b) uses ring pointers to
>>> calculate slots which is most accurate, c) removes the dependence on
>>> MAX_SKB_FRAGS in guest RX path.
>>>
>>> Anyway, we should get this fixed ASAP.
>> Totally agree. This issue is easy to be reproduced with large MTU.
>> It is better to upstream the fix soon in case others hit it and
>> waste time to fix it.
> I'd like to go with Xi's proposed patch that I posed earlier. The main
> thing that's kept me from sending a final version is lack of time to
> retest against a newer kernel.
>
> Could someone help out with that? I probably can't get to it until the
> end of the week.

I can rebase it. Since wei's NAPI &1:1 model patch went into net-next 
already, it should not be able to applied directly.

Thanks
Annie
>
> Sorry for the delay. :-(
>
> --msw

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] xen-netback: count number required slots for an skb more carefully
  2013-09-04  6:56       ` [Xen-devel] " annie li
@ 2013-09-04  7:38         ` Wei Liu
  2013-09-04  8:06           ` annie li
  0 siblings, 1 reply; 21+ messages in thread
From: Wei Liu @ 2013-09-04  7:38 UTC (permalink / raw)
  To: annie li
  Cc: Matt Wilson, Wei Liu, David Vrabel, xen-devel,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, Ian Campbell, netdev, msw

On Wed, Sep 04, 2013 at 02:56:20PM +0800, annie li wrote:
> 
> On 2013-9-4 14:04, Matt Wilson wrote:
> >On Wed, Sep 04, 2013 at 10:25:59AM +0800, annie li wrote:
> >>On 2013-9-4 5:53, Wei Liu wrote:
> >[...]
> >>>Matt, do you fancy sending the final version? IIRC the commit message
> >>>needs to be re-written. I personally still prefer Matt's solution as
> >>>it a) make efficient use of the ring, b) uses ring pointers to
> >>>calculate slots which is most accurate, c) removes the dependence on
> >>>MAX_SKB_FRAGS in guest RX path.
> >>>
> >>>Anyway, we should get this fixed ASAP.
> >>Totally agree. This issue is easy to be reproduced with large MTU.
> >>It is better to upstream the fix soon in case others hit it and
> >>waste time to fix it.
> >I'd like to go with Xi's proposed patch that I posed earlier. The main
> >thing that's kept me from sending a final version is lack of time to
> >retest against a newer kernel.
> >
> >Could someone help out with that? I probably can't get to it until the
> >end of the week.
> 
> I can rebase it. Since wei's NAPI &1:1 model patch went into
> net-next already, it should not be able to applied directly.
> 

I think this one should go to net then queue up for stable, not
net-next.

Wei.

> Thanks
> Annie
> >
> >Sorry for the delay. :-(
> >
> >--msw

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] xen-netback: count number required slots for an skb more carefully
  2013-09-04  7:38         ` Wei Liu
@ 2013-09-04  8:06           ` annie li
  2013-09-04  8:20             ` Wei Liu
  0 siblings, 1 reply; 21+ messages in thread
From: annie li @ 2013-09-04  8:06 UTC (permalink / raw)
  To: Wei Liu
  Cc: Matt Wilson, David Vrabel, xen-devel, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, Ian Campbell, netdev, msw


On 2013-9-4 15:38, Wei Liu wrote:
> On Wed, Sep 04, 2013 at 02:56:20PM +0800, annie li wrote:
>> On 2013-9-4 14:04, Matt Wilson wrote:
>>> On Wed, Sep 04, 2013 at 10:25:59AM +0800, annie li wrote:
>>>> On 2013-9-4 5:53, Wei Liu wrote:
>>> [...]
>>>>> Matt, do you fancy sending the final version? IIRC the commit message
>>>>> needs to be re-written. I personally still prefer Matt's solution as
>>>>> it a) make efficient use of the ring, b) uses ring pointers to
>>>>> calculate slots which is most accurate, c) removes the dependence on
>>>>> MAX_SKB_FRAGS in guest RX path.
>>>>>
>>>>> Anyway, we should get this fixed ASAP.
>>>> Totally agree. This issue is easy to be reproduced with large MTU.
>>>> It is better to upstream the fix soon in case others hit it and
>>>> waste time to fix it.
>>> I'd like to go with Xi's proposed patch that I posed earlier. The main
>>> thing that's kept me from sending a final version is lack of time to
>>> retest against a newer kernel.
>>>
>>> Could someone help out with that? I probably can't get to it until the
>>> end of the week.
>> I can rebase it. Since wei's NAPI &1:1 model patch went into
>> net-next already, it should not be able to applied directly.
>>
> I think this one should go to net then queue up for stable, not
> net-next.
If so, that would be easier. It is almost end of the day here, and I do 
not have enough time to test it until tomorrow. Please go ahead with it 
if you have time.

Thanks
Annie
> Wei.
>
>> Thanks
>> Annie
>>> Sorry for the delay. :-(
>>>
>>> --msw

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] xen-netback: count number required slots for an skb more carefully
  2013-09-04  8:06           ` annie li
@ 2013-09-04  8:20             ` Wei Liu
  0 siblings, 0 replies; 21+ messages in thread
From: Wei Liu @ 2013-09-04  8:20 UTC (permalink / raw)
  To: annie li
  Cc: Wei Liu, Matt Wilson, David Vrabel, xen-devel,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, Ian Campbell, netdev, msw

On Wed, Sep 04, 2013 at 04:06:51PM +0800, annie li wrote:
> 
> On 2013-9-4 15:38, Wei Liu wrote:
> >On Wed, Sep 04, 2013 at 02:56:20PM +0800, annie li wrote:
> >>On 2013-9-4 14:04, Matt Wilson wrote:
> >>>On Wed, Sep 04, 2013 at 10:25:59AM +0800, annie li wrote:
> >>>>On 2013-9-4 5:53, Wei Liu wrote:
> >>>[...]
> >>>>>Matt, do you fancy sending the final version? IIRC the commit message
> >>>>>needs to be re-written. I personally still prefer Matt's solution as
> >>>>>it a) make efficient use of the ring, b) uses ring pointers to
> >>>>>calculate slots which is most accurate, c) removes the dependence on
> >>>>>MAX_SKB_FRAGS in guest RX path.
> >>>>>
> >>>>>Anyway, we should get this fixed ASAP.
> >>>>Totally agree. This issue is easy to be reproduced with large MTU.
> >>>>It is better to upstream the fix soon in case others hit it and
> >>>>waste time to fix it.
> >>>I'd like to go with Xi's proposed patch that I posed earlier. The main
> >>>thing that's kept me from sending a final version is lack of time to
> >>>retest against a newer kernel.
> >>>
> >>>Could someone help out with that? I probably can't get to it until the
> >>>end of the week.
> >>I can rebase it. Since wei's NAPI &1:1 model patch went into
> >>net-next already, it should not be able to applied directly.
> >>
> >I think this one should go to net then queue up for stable, not
> >net-next.
> If so, that would be easier. It is almost end of the day here, and I
> do not have enough time to test it until tomorrow. Please go ahead
> with it if you have time.
> 

Sure, I'm fine with this. I will spend some time rebase it today. But
eventually I would like to get it tested as much as possible. That's to
say, I would need confirmation / acked-by from you, David and Matt.

Wei.

> Thanks
> Annie
> >Wei.
> >
> >>Thanks
> >>Annie
> >>>Sorry for the delay. :-(
> >>>
> >>>--msw

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: xen-netback: count number required slots for an skb more carefully
  2013-09-04  6:04     ` Matt Wilson
  2013-09-04  6:56       ` [Xen-devel] " annie li
@ 2013-09-04  8:36       ` Ian Campbell
  2013-09-04  8:41         ` Wei Liu
  1 sibling, 1 reply; 21+ messages in thread
From: Ian Campbell @ 2013-09-04  8:36 UTC (permalink / raw)
  To: Matt Wilson
  Cc: annie li, Wei Liu, David Vrabel, xen-devel, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, netdev, msw

On Tue, 2013-09-03 at 23:04 -0700, Matt Wilson wrote:
> On Wed, Sep 04, 2013 at 10:25:59AM +0800, annie li wrote:
> > On 2013-9-4 5:53, Wei Liu wrote:
> [...]
> > >Matt, do you fancy sending the final version? IIRC the commit message
> > >needs to be re-written. I personally still prefer Matt's solution as
> > >it a) make efficient use of the ring, b) uses ring pointers to
> > >calculate slots which is most accurate, c) removes the dependence on
> > >MAX_SKB_FRAGS in guest RX path.
> > >
> > >Anyway, we should get this fixed ASAP.

Yes. Would there be any harm in just applying David's patch (just
because it is the one currently in our hands and it is based on a recent
enough kernel). It looks right from a correctness PoV to me.

> > Totally agree. This issue is easy to be reproduced with large MTU.
> > It is better to upstream the fix soon in case others hit it and
> > waste time to fix it.
> 
> I'd like to go with Xi's proposed patch that I posed earlier.

I don't think I saw that, do you have a message-id?

("Xi" is a bit too short for a search on my INBOX unfortunately).

>  The main
> thing that's kept me from sending a final version is lack of time to
> retest against a newer kernel.
> 
> Could someone help out with that? I probably can't get to it until the
> end of the week.
> 
> Sorry for the delay. :-(
> 
> --msw

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: xen-netback: count number required slots for an skb more carefully
  2013-09-04  8:36       ` Ian Campbell
@ 2013-09-04  8:41         ` Wei Liu
  0 siblings, 0 replies; 21+ messages in thread
From: Wei Liu @ 2013-09-04  8:41 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Matt Wilson, annie li, Wei Liu, David Vrabel, xen-devel,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, netdev, msw

On Wed, Sep 04, 2013 at 09:36:10AM +0100, Ian Campbell wrote:
> On Tue, 2013-09-03 at 23:04 -0700, Matt Wilson wrote:
> > On Wed, Sep 04, 2013 at 10:25:59AM +0800, annie li wrote:
> > > On 2013-9-4 5:53, Wei Liu wrote:
> > [...]
> > > >Matt, do you fancy sending the final version? IIRC the commit message
> > > >needs to be re-written. I personally still prefer Matt's solution as
> > > >it a) make efficient use of the ring, b) uses ring pointers to
> > > >calculate slots which is most accurate, c) removes the dependence on
> > > >MAX_SKB_FRAGS in guest RX path.
> > > >
> > > >Anyway, we should get this fixed ASAP.
> 
> Yes. Would there be any harm in just applying David's patch (just
> because it is the one currently in our hands and it is based on a recent
> enough kernel). It looks right from a correctness PoV to me.
> 
> > > Totally agree. This issue is easy to be reproduced with large MTU.
> > > It is better to upstream the fix soon in case others hit it and
> > > waste time to fix it.
> > 
> > I'd like to go with Xi's proposed patch that I posed earlier.
> 
> I don't think I saw that, do you have a message-id?
> 
> ("Xi" is a bit too short for a search on my INBOX unfortunately).
> 

This one <1373409659-22383-1-git-send-email-msw@amazon.com>

Wei.

> >  The main
> > thing that's kept me from sending a final version is lack of time to
> > retest against a newer kernel.
> > 
> > Could someone help out with that? I probably can't get to it until the
> > end of the week.
> > 
> > Sorry for the delay. :-(
> > 
> > --msw
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] xen-netback: count number required slots for an skb more carefully
  2013-09-03 21:53 ` Wei Liu
  2013-09-04  2:25   ` annie li
@ 2013-09-04 11:48   ` David Vrabel
  2013-09-04 12:41     ` Ian Campbell
  2013-09-04 13:14     ` Wei Liu
  1 sibling, 2 replies; 21+ messages in thread
From: David Vrabel @ 2013-09-04 11:48 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky, Ian Campbell,
	netdev, msw, annie.li

On 03/09/13 22:53, Wei Liu wrote:
> On Tue, Sep 03, 2013 at 06:29:50PM +0100, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> When a VM is providing an iSCSI target and the LUN is used by the
>> backend domain, the generated skbs for direct I/O writes to the disk
>> have large, multi-page skb->data but no frags.
>>
>> With some lengths and starting offsets, xen_netbk_count_skb_slots()
>> would be one short because the simple calculation of
>> DIV_ROUND_UP(skb_headlen(), PAGE_SIZE) was not accounting for the
>> decisions made by start_new_rx_buffer() which does not guarantee
>> responses are fully packed.
>>
>> For example, a skb with length < 2 pages but which spans 3 pages would
>> be counted as requiring 2 slots but would actually use 3 slots.
>>
>> skb->data:
>>
>>     |        1111|222222222222|3333        |
>>
>> Fully packed, this would need 2 slots:
>>
>>     |111122222222|22223333    |
>>
>> But because the 2nd page wholy fits into a slot it is not split across
>> slots and goes into a slot of its own:
>>
>>     |1111        |222222222222|3333        |
>>
>> Miscounting the number of slots means netback may push more responses
>> than the number of available requests.  This will cause the frontend
>> to get very confused and report "Too many frags/slots".  The frontend
>> never recovers and will eventually BUG.
>>
>> Fix this by counting the number of required slots more carefully.  In
>> xen_netbk_count_skb_slots(), more closely follow the algorithm used by
>> xen_netbk_gop_skb() by introducing xen_netbk_count_frag_slots() which
>> is the dry-run equivalent of netbk_gop_frag_copy().
>>
> 
> Phew! So this is backend miscounting bug. I thought it was a frontend
> bug so it didn't ring a bell when we had our face-to-face discussion,
> sorry. :-(
> 
> This bug was discussed back in July among Annie, Matt, Ian and I. We
> finally agreed to take Matt's solution. Matt agreed to post final
> version within a week but obviously he's too busy to do so. I was away
> so I didn't follow closely. Eventually it fell through the crack. :-(

I think I prefer fixing the counting for backporting to stable kernels.
 Xi's approach of packing the ring differently is a change in frontend
visible behaviour and seems more risky. e.g., possible performance
impact so I would like to see some performance analysis of that approach.

David

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] xen-netback: count number required slots for an skb more carefully
  2013-09-04 11:48   ` [PATCH] " David Vrabel
@ 2013-09-04 12:41     ` Ian Campbell
  2013-09-04 13:35       ` Wei Liu
  2013-09-09  9:20       ` Wei Liu
  2013-09-04 13:14     ` Wei Liu
  1 sibling, 2 replies; 21+ messages in thread
From: Ian Campbell @ 2013-09-04 12:41 UTC (permalink / raw)
  To: David Vrabel
  Cc: Wei Liu, xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	netdev, msw, annie.li

On Wed, 2013-09-04 at 12:48 +0100, David Vrabel wrote:
> On 03/09/13 22:53, Wei Liu wrote:
> > On Tue, Sep 03, 2013 at 06:29:50PM +0100, David Vrabel wrote:
> >> From: David Vrabel <david.vrabel@citrix.com>
> >>
> >> When a VM is providing an iSCSI target and the LUN is used by the
> >> backend domain, the generated skbs for direct I/O writes to the disk
> >> have large, multi-page skb->data but no frags.
> >>
> >> With some lengths and starting offsets, xen_netbk_count_skb_slots()
> >> would be one short because the simple calculation of
> >> DIV_ROUND_UP(skb_headlen(), PAGE_SIZE) was not accounting for the
> >> decisions made by start_new_rx_buffer() which does not guarantee
> >> responses are fully packed.
> >>
> >> For example, a skb with length < 2 pages but which spans 3 pages would
> >> be counted as requiring 2 slots but would actually use 3 slots.
> >>
> >> skb->data:
> >>
> >>     |        1111|222222222222|3333        |
> >>
> >> Fully packed, this would need 2 slots:
> >>
> >>     |111122222222|22223333    |
> >>
> >> But because the 2nd page wholy fits into a slot it is not split across
> >> slots and goes into a slot of its own:
> >>
> >>     |1111        |222222222222|3333        |
> >>
> >> Miscounting the number of slots means netback may push more responses
> >> than the number of available requests.  This will cause the frontend
> >> to get very confused and report "Too many frags/slots".  The frontend
> >> never recovers and will eventually BUG.
> >>
> >> Fix this by counting the number of required slots more carefully.  In
> >> xen_netbk_count_skb_slots(), more closely follow the algorithm used by
> >> xen_netbk_gop_skb() by introducing xen_netbk_count_frag_slots() which
> >> is the dry-run equivalent of netbk_gop_frag_copy().
> >>
> > 
> > Phew! So this is backend miscounting bug. I thought it was a frontend
> > bug so it didn't ring a bell when we had our face-to-face discussion,
> > sorry. :-(
> > 
> > This bug was discussed back in July among Annie, Matt, Ian and I. We
> > finally agreed to take Matt's solution. Matt agreed to post final
> > version within a week but obviously he's too busy to do so. I was away
> > so I didn't follow closely. Eventually it fell through the crack. :-(
> 
> I think I prefer fixing the counting for backporting to stable kernels.

That's a good argument. I think we should take this patch, or something
very like it, now and then rebase the more complex thing on top.

>  Xi's approach of packing the ring differently is a change in frontend
> visible behaviour and seems more risky. e.g., possible performance
> impact so I would like to see some performance analysis of that approach.

Yes.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] xen-netback: count number required slots for an skb more carefully
  2013-09-04 11:48   ` [PATCH] " David Vrabel
  2013-09-04 12:41     ` Ian Campbell
@ 2013-09-04 13:14     ` Wei Liu
  2013-09-04 14:02       ` David Vrabel
  1 sibling, 1 reply; 21+ messages in thread
From: Wei Liu @ 2013-09-04 13:14 UTC (permalink / raw)
  To: David Vrabel
  Cc: Wei Liu, xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	Ian Campbell, netdev, msw, annie.li

On Wed, Sep 04, 2013 at 12:48:15PM +0100, David Vrabel wrote:
> On 03/09/13 22:53, Wei Liu wrote:
> > On Tue, Sep 03, 2013 at 06:29:50PM +0100, David Vrabel wrote:
> >> From: David Vrabel <david.vrabel@citrix.com>
> >>
> >> When a VM is providing an iSCSI target and the LUN is used by the
> >> backend domain, the generated skbs for direct I/O writes to the disk
> >> have large, multi-page skb->data but no frags.
> >>
> >> With some lengths and starting offsets, xen_netbk_count_skb_slots()
> >> would be one short because the simple calculation of
> >> DIV_ROUND_UP(skb_headlen(), PAGE_SIZE) was not accounting for the
> >> decisions made by start_new_rx_buffer() which does not guarantee
> >> responses are fully packed.
> >>
> >> For example, a skb with length < 2 pages but which spans 3 pages would
> >> be counted as requiring 2 slots but would actually use 3 slots.
> >>
> >> skb->data:
> >>
> >>     |        1111|222222222222|3333        |
> >>
> >> Fully packed, this would need 2 slots:
> >>
> >>     |111122222222|22223333    |
> >>
> >> But because the 2nd page wholy fits into a slot it is not split across
> >> slots and goes into a slot of its own:
> >>
> >>     |1111        |222222222222|3333        |
> >>
> >> Miscounting the number of slots means netback may push more responses
> >> than the number of available requests.  This will cause the frontend
> >> to get very confused and report "Too many frags/slots".  The frontend
> >> never recovers and will eventually BUG.
> >>
> >> Fix this by counting the number of required slots more carefully.  In
> >> xen_netbk_count_skb_slots(), more closely follow the algorithm used by
> >> xen_netbk_gop_skb() by introducing xen_netbk_count_frag_slots() which
> >> is the dry-run equivalent of netbk_gop_frag_copy().
> >>
> > 
> > Phew! So this is backend miscounting bug. I thought it was a frontend
> > bug so it didn't ring a bell when we had our face-to-face discussion,
> > sorry. :-(
> > 
> > This bug was discussed back in July among Annie, Matt, Ian and I. We
> > finally agreed to take Matt's solution. Matt agreed to post final
> > version within a week but obviously he's too busy to do so. I was away
> > so I didn't follow closely. Eventually it fell through the crack. :-(
> 
> I think I prefer fixing the counting for backporting to stable kernels.

The original patch has coding style change. Sans that contextual change
it's not a very long patch.

>  Xi's approach of packing the ring differently is a change in frontend
> visible behaviour and seems more risky. e.g., possible performance
> impact so I would like to see some performance analysis of that approach.
> 

With Xi's approach it is more efficient for backend to process. As we
now use one less grant copy operation which means we copy the same
amount of data with less grant ops.

>From frontend's PoV I think the impact is minimal. Frontend is involved
in assembling the packets. It only takes what's in the ring and chain
them together. The operation involves copying so far is the
__pskb_pull_tail which happens a) in rare case when there's more frags
than frontend's MAX_SKB_FRAGS, b) when pull_to > skb_headlen which
happens. With Xi's change the rare case a) will even be rarer than
before as we use less slots. b) happens the same as it happens before
Xi's change, because the pull is guarded by "if (pull_to >
skb_headlen(skb))" and Xi's change doesn't affect skb_headlen.

So overall I don't see obvious downside.

Wei.

> David

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] xen-netback: count number required slots for an skb more carefully
  2013-09-04 12:41     ` Ian Campbell
@ 2013-09-04 13:35       ` Wei Liu
  2013-09-09  9:20       ` Wei Liu
  1 sibling, 0 replies; 21+ messages in thread
From: Wei Liu @ 2013-09-04 13:35 UTC (permalink / raw)
  To: Ian Campbell
  Cc: David Vrabel, Wei Liu, xen-devel, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, netdev, msw, annie.li

On Wed, Sep 04, 2013 at 01:41:25PM +0100, Ian Campbell wrote:
> On Wed, 2013-09-04 at 12:48 +0100, David Vrabel wrote:
> > On 03/09/13 22:53, Wei Liu wrote:
> > > On Tue, Sep 03, 2013 at 06:29:50PM +0100, David Vrabel wrote:
> > >> From: David Vrabel <david.vrabel@citrix.com>
> > >>
> > >> When a VM is providing an iSCSI target and the LUN is used by the
> > >> backend domain, the generated skbs for direct I/O writes to the disk
> > >> have large, multi-page skb->data but no frags.
> > >>
> > >> With some lengths and starting offsets, xen_netbk_count_skb_slots()
> > >> would be one short because the simple calculation of
> > >> DIV_ROUND_UP(skb_headlen(), PAGE_SIZE) was not accounting for the
> > >> decisions made by start_new_rx_buffer() which does not guarantee
> > >> responses are fully packed.
> > >>
> > >> For example, a skb with length < 2 pages but which spans 3 pages would
> > >> be counted as requiring 2 slots but would actually use 3 slots.
> > >>
> > >> skb->data:
> > >>
> > >>     |        1111|222222222222|3333        |
> > >>
> > >> Fully packed, this would need 2 slots:
> > >>
> > >>     |111122222222|22223333    |
> > >>
> > >> But because the 2nd page wholy fits into a slot it is not split across
> > >> slots and goes into a slot of its own:
> > >>
> > >>     |1111        |222222222222|3333        |
> > >>
> > >> Miscounting the number of slots means netback may push more responses
> > >> than the number of available requests.  This will cause the frontend
> > >> to get very confused and report "Too many frags/slots".  The frontend
> > >> never recovers and will eventually BUG.
> > >>
> > >> Fix this by counting the number of required slots more carefully.  In
> > >> xen_netbk_count_skb_slots(), more closely follow the algorithm used by
> > >> xen_netbk_gop_skb() by introducing xen_netbk_count_frag_slots() which
> > >> is the dry-run equivalent of netbk_gop_frag_copy().
> > >>
> > > 
> > > Phew! So this is backend miscounting bug. I thought it was a frontend
> > > bug so it didn't ring a bell when we had our face-to-face discussion,
> > > sorry. :-(
> > > 
> > > This bug was discussed back in July among Annie, Matt, Ian and I. We
> > > finally agreed to take Matt's solution. Matt agreed to post final
> > > version within a week but obviously he's too busy to do so. I was away
> > > so I didn't follow closely. Eventually it fell through the crack. :-(
> > 
> > I think I prefer fixing the counting for backporting to stable kernels.
> 
> That's a good argument. I think we should take this patch, or something
> very like it, now and then rebase the more complex thing on top.
> 

It's not as complex as you first see it. David's and Xi's approaches are
different routes to the same destination.

David's approach makes counting fits what netbk_gop_frag_copy actually
does, while Xi's approach is the other way around -- makes
netbk_gop_frag_copy fits what counting returns.

As long as we have them agreed with each other we're fine.

> >  Xi's approach of packing the ring differently is a change in frontend
> > visible behaviour and seems more risky. e.g., possible performance
> > impact so I would like to see some performance analysis of that approach.
> 
> Yes.

The performance impact is more concerning. I have a short analysis in
the reply to David's email.

Wei.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] xen-netback: count number required slots for an skb more carefully
  2013-09-04 13:14     ` Wei Liu
@ 2013-09-04 14:02       ` David Vrabel
  2013-09-04 15:44         ` Wei Liu
  0 siblings, 1 reply; 21+ messages in thread
From: David Vrabel @ 2013-09-04 14:02 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky, Ian Campbell,
	netdev, msw, annie.li

On 04/09/13 14:14, Wei Liu wrote:
> On Wed, Sep 04, 2013 at 12:48:15PM +0100, David Vrabel wrote:
>> On 03/09/13 22:53, Wei Liu wrote:
>>> On Tue, Sep 03, 2013 at 06:29:50PM +0100, David Vrabel wrote:
>>>> From: David Vrabel <david.vrabel@citrix.com>
>>>>
>>>> When a VM is providing an iSCSI target and the LUN is used by the
>>>> backend domain, the generated skbs for direct I/O writes to the disk
>>>> have large, multi-page skb->data but no frags.
>>>>
>>>> With some lengths and starting offsets, xen_netbk_count_skb_slots()
>>>> would be one short because the simple calculation of
>>>> DIV_ROUND_UP(skb_headlen(), PAGE_SIZE) was not accounting for the
>>>> decisions made by start_new_rx_buffer() which does not guarantee
>>>> responses are fully packed.
>>>>
>>>> For example, a skb with length < 2 pages but which spans 3 pages would
>>>> be counted as requiring 2 slots but would actually use 3 slots.
>>>>
>>>> skb->data:
>>>>
>>>>     |        1111|222222222222|3333        |
>>>>
>>>> Fully packed, this would need 2 slots:
>>>>
>>>>     |111122222222|22223333    |
>>>>
>>>> But because the 2nd page wholy fits into a slot it is not split across
>>>> slots and goes into a slot of its own:
>>>>
>>>>     |1111        |222222222222|3333        |
>>>>
>>>> Miscounting the number of slots means netback may push more responses
>>>> than the number of available requests.  This will cause the frontend
>>>> to get very confused and report "Too many frags/slots".  The frontend
>>>> never recovers and will eventually BUG.
>>>>
>>>> Fix this by counting the number of required slots more carefully.  In
>>>> xen_netbk_count_skb_slots(), more closely follow the algorithm used by
>>>> xen_netbk_gop_skb() by introducing xen_netbk_count_frag_slots() which
>>>> is the dry-run equivalent of netbk_gop_frag_copy().
>>>>
>>>
>>> Phew! So this is backend miscounting bug. I thought it was a frontend
>>> bug so it didn't ring a bell when we had our face-to-face discussion,
>>> sorry. :-(
>>>
>>> This bug was discussed back in July among Annie, Matt, Ian and I. We
>>> finally agreed to take Matt's solution. Matt agreed to post final
>>> version within a week but obviously he's too busy to do so. I was away
>>> so I didn't follow closely. Eventually it fell through the crack. :-(
>>
>> I think I prefer fixing the counting for backporting to stable kernels.
> 
> The original patch has coding style change. Sans that contextual change
> it's not a very long patch.

The size of the patch isn't the main concern for backport-ability.  It's
the frontend visible changes and thus any (unexpected) impacts on
frontends -- this is especially important as only a small fraction of
frontends in use will be tested with these changes.

>>  Xi's approach of packing the ring differently is a change in frontend
>> visible behaviour and seems more risky. e.g., possible performance
>> impact so I would like to see some performance analysis of that approach.
>>
> 
> With Xi's approach it is more efficient for backend to process. As we
> now use one less grant copy operation which means we copy the same
> amount of data with less grant ops.

It think it uses more grant ops because the copies of the linear
portion are in chunks that do not cross source page boundaries.

i.e., in netbk_gop_skb():

	data = skb->data;
	while (data < skb_tail_pointer(skb)) {
		unsigned int offset = offset_in_page(data);
		unsigned int len = PAGE_SIZE - offset;
                [...]

It wasn't clear from the patch that this had been considered and that
any extra space needed in the grant op array was made available.

> From frontend's PoV I think the impact is minimal. Frontend is involved
> in assembling the packets. It only takes what's in the ring and chain
> them together. The operation involves copying so far is the
> __pskb_pull_tail which happens a) in rare case when there's more frags
> than frontend's MAX_SKB_FRAGS, b) when pull_to > skb_headlen which
> happens. With Xi's change the rare case a) will even be rarer than
> before as we use less slots. b) happens the same as it happens before
> Xi's change, because the pull is guarded by "if (pull_to >
> skb_headlen(skb))" and Xi's change doesn't affect skb_headlen.
> 
> So overall I don't see obvious downside.

The obvious downside is it doesn't exist (in a form that can be applied
now), it hasn't been tested and I think there may well be a subtle bug
that would need a careful review or testing to confirm/deny.

You are free to work on this as a future improvements but I really don't
see why this critical bug fix needs to be delayed any further.

David

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] xen-netback: count number required slots for an skb more carefully
  2013-09-04 14:02       ` David Vrabel
@ 2013-09-04 15:44         ` Wei Liu
  2013-09-05 10:12           ` David Vrabel
  0 siblings, 1 reply; 21+ messages in thread
From: Wei Liu @ 2013-09-04 15:44 UTC (permalink / raw)
  To: David Vrabel
  Cc: Wei Liu, xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	Ian Campbell, netdev, msw, annie.li

On Wed, Sep 04, 2013 at 03:02:01PM +0100, David Vrabel wrote:
[...]
> >>
> >> I think I prefer fixing the counting for backporting to stable kernels.
> > 
> > The original patch has coding style change. Sans that contextual change
> > it's not a very long patch.
> 
> The size of the patch isn't the main concern for backport-ability.  It's
> the frontend visible changes and thus any (unexpected) impacts on
> frontends -- this is especially important as only a small fraction of
> frontends in use will be tested with these changes.
> 
> >>  Xi's approach of packing the ring differently is a change in frontend
> >> visible behaviour and seems more risky. e.g., possible performance
> >> impact so I would like to see some performance analysis of that approach.
> >>
> > 
> > With Xi's approach it is more efficient for backend to process. As we
> > now use one less grant copy operation which means we copy the same
> > amount of data with less grant ops.
> 
> It think it uses more grant ops because the copies of the linear
> portion are in chunks that do not cross source page boundaries.
> 
> i.e., in netbk_gop_skb():
> 
> 	data = skb->data;
> 	while (data < skb_tail_pointer(skb)) {
> 		unsigned int offset = offset_in_page(data);
> 		unsigned int len = PAGE_SIZE - offset;
>                 [...]
> 
> It wasn't clear from the patch that this had been considered and that
> any extra space needed in the grant op array was made available.
> 

If I'm not mistaken the grant op array is already enormous. See the
comment in struct xen_netbk for grant_copy_op. The case that a buffer
straddles two slots was taken into consideration long ago -- that's
why you don't see any comment or code change WRT that...

> > From frontend's PoV I think the impact is minimal. Frontend is involved
> > in assembling the packets. It only takes what's in the ring and chain
> > them together. The operation involves copying so far is the
> > __pskb_pull_tail which happens a) in rare case when there's more frags
> > than frontend's MAX_SKB_FRAGS, b) when pull_to > skb_headlen which
> > happens. With Xi's change the rare case a) will even be rarer than
> > before as we use less slots. b) happens the same as it happens before
> > Xi's change, because the pull is guarded by "if (pull_to >
> > skb_headlen(skb))" and Xi's change doesn't affect skb_headlen.
> > 
> > So overall I don't see obvious downside.
> 
> The obvious downside is it doesn't exist (in a form that can be applied
> now), it hasn't been tested and I think there may well be a subtle bug
> that would need a careful review or testing to confirm/deny.
> 

It does exist and apply cleanly on top of net tree. I haven't posted
it yet because we haven't reached concensus which path to take. :-)

The only reason that last version didn't get upstreamed is that the
commit message wasn't clear enough. From the technical PoV it's quite
sound and I believe Amazon has been using it for a long time -- the
older reference dates back to Aug 2012 IIRC. It's just never properly
upstreamed.

> You are free to work on this as a future improvements but I really don't
> see why this critical bug fix needs to be delayed any further.
> 

True. I don't mean to hold off critical fix. Just want to make sure that
every option is presented and considered.

Wei.

> David

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] xen-netback: count number required slots for an skb more carefully
  2013-09-03 17:29 [PATCH] xen-netback: count number required slots for an skb more carefully David Vrabel
  2013-09-03 21:53 ` Wei Liu
@ 2013-09-04 18:48 ` David Miller
  2013-09-05  7:35   ` Ian Campbell
  1 sibling, 1 reply; 21+ messages in thread
From: David Miller @ 2013-09-04 18:48 UTC (permalink / raw)
  To: david.vrabel
  Cc: xen-devel, konrad.wilk, boris.ostrovsky, ian.campbell, netdev


Can you guys just post the patch you want applied once this thread
moves closer to a consensus?  Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] xen-netback: count number required slots for an skb more carefully
  2013-09-04 18:48 ` David Miller
@ 2013-09-05  7:35   ` Ian Campbell
  0 siblings, 0 replies; 21+ messages in thread
From: Ian Campbell @ 2013-09-05  7:35 UTC (permalink / raw)
  To: David Miller
  Cc: david.vrabel, xen-devel, konrad.wilk, boris.ostrovsky, netdev

On Wed, 2013-09-04 at 14:48 -0400, David Miller wrote:
> Can you guys just post the patch you want applied once this thread
> moves closer to a consensus?  Thanks.

Of course.

Ian.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] xen-netback: count number required slots for an skb more carefully
  2013-09-04 15:44         ` Wei Liu
@ 2013-09-05 10:12           ` David Vrabel
  2013-09-05 10:27             ` Wei Liu
  0 siblings, 1 reply; 21+ messages in thread
From: David Vrabel @ 2013-09-05 10:12 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky, Ian Campbell,
	netdev, msw, annie.li

On 04/09/13 16:44, Wei Liu wrote:
> On Wed, Sep 04, 2013 at 03:02:01PM +0100, David Vrabel wrote:
> [...]
>>>>
>>>> I think I prefer fixing the counting for backporting to stable kernels.
>>>
>>> The original patch has coding style change. Sans that contextual change
>>> it's not a very long patch.
>>
>> The size of the patch isn't the main concern for backport-ability.  It's
>> the frontend visible changes and thus any (unexpected) impacts on
>> frontends -- this is especially important as only a small fraction of
>> frontends in use will be tested with these changes.
>>
>>>>  Xi's approach of packing the ring differently is a change in frontend
>>>> visible behaviour and seems more risky. e.g., possible performance
>>>> impact so I would like to see some performance analysis of that approach.
>>>>
>>>
>>> With Xi's approach it is more efficient for backend to process. As we
>>> now use one less grant copy operation which means we copy the same
>>> amount of data with less grant ops.
>>
>> It think it uses more grant ops because the copies of the linear
>> portion are in chunks that do not cross source page boundaries.
>>
>> i.e., in netbk_gop_skb():
>>
>> 	data = skb->data;
>> 	while (data < skb_tail_pointer(skb)) {
>> 		unsigned int offset = offset_in_page(data);
>> 		unsigned int len = PAGE_SIZE - offset;
>>                 [...]
>>
>> It wasn't clear from the patch that this had been considered and that
>> any extra space needed in the grant op array was made available.
>>
> 
> If I'm not mistaken the grant op array is already enormous. See the
> comment in struct xen_netbk for grant_copy_op. The case that a buffer
> straddles two slots was taken into consideration long ago -- that's
> why you don't see any comment or code change WRT that..

I'm not convinced that even that is enough for the current
implementation in the (very) worse case.

Consider a skb with 8 frags all 512 in length.  The linear data will be
placed into 1 slot, and the frags will be packed into 1 slot so 9 grant
ops and 2 slots.

I definitely think we do not want to potentially regress any further in
this area.

David

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] xen-netback: count number required slots for an skb more carefully
  2013-09-05 10:12           ` David Vrabel
@ 2013-09-05 10:27             ` Wei Liu
  0 siblings, 0 replies; 21+ messages in thread
From: Wei Liu @ 2013-09-05 10:27 UTC (permalink / raw)
  To: David Vrabel
  Cc: Wei Liu, xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	Ian Campbell, netdev, msw, annie.li

On Thu, Sep 05, 2013 at 11:12:11AM +0100, David Vrabel wrote:
> On 04/09/13 16:44, Wei Liu wrote:
> > On Wed, Sep 04, 2013 at 03:02:01PM +0100, David Vrabel wrote:
> > [...]
> >>>>
> >>>> I think I prefer fixing the counting for backporting to stable kernels.
> >>>
> >>> The original patch has coding style change. Sans that contextual change
> >>> it's not a very long patch.
> >>
> >> The size of the patch isn't the main concern for backport-ability.  It's
> >> the frontend visible changes and thus any (unexpected) impacts on
> >> frontends -- this is especially important as only a small fraction of
> >> frontends in use will be tested with these changes.
> >>
> >>>>  Xi's approach of packing the ring differently is a change in frontend
> >>>> visible behaviour and seems more risky. e.g., possible performance
> >>>> impact so I would like to see some performance analysis of that approach.
> >>>>
> >>>
> >>> With Xi's approach it is more efficient for backend to process. As we
> >>> now use one less grant copy operation which means we copy the same
> >>> amount of data with less grant ops.
> >>
> >> It think it uses more grant ops because the copies of the linear
> >> portion are in chunks that do not cross source page boundaries.
> >>
> >> i.e., in netbk_gop_skb():
> >>
> >> 	data = skb->data;
> >> 	while (data < skb_tail_pointer(skb)) {
> >> 		unsigned int offset = offset_in_page(data);
> >> 		unsigned int len = PAGE_SIZE - offset;
> >>                 [...]
> >>
> >> It wasn't clear from the patch that this had been considered and that
> >> any extra space needed in the grant op array was made available.
> >>
> > 
> > If I'm not mistaken the grant op array is already enormous. See the
> > comment in struct xen_netbk for grant_copy_op. The case that a buffer
> > straddles two slots was taken into consideration long ago -- that's
> > why you don't see any comment or code change WRT that..
> 
> I'm not convinced that even that is enough for the current
> implementation in the (very) worse case.
> 
> Consider a skb with 8 frags all 512 in length.  The linear data will be
> placed into 1 slot, and the frags will be packed into 1 slot so 9 grant
> ops and 2 slots.
> 

That is still less than 8 * 2 = 16 slots in the grant copy array, right?
Basically each paged buffer is allowed for 2 slots in the grant copy
array.

> I definitely think we do not want to potentially regress any further in
> this area.
> 

Sure.

> David

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] xen-netback: count number required slots for an skb more carefully
  2013-09-04 12:41     ` Ian Campbell
  2013-09-04 13:35       ` Wei Liu
@ 2013-09-09  9:20       ` Wei Liu
  1 sibling, 0 replies; 21+ messages in thread
From: Wei Liu @ 2013-09-09  9:20 UTC (permalink / raw)
  To: Ian Campbell
  Cc: David Vrabel, Wei Liu, xen-devel, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, netdev, msw, annie.li

On Wed, Sep 04, 2013 at 01:41:25PM +0100, Ian Campbell wrote:
[...]
> > > This bug was discussed back in July among Annie, Matt, Ian and I. We
> > > finally agreed to take Matt's solution. Matt agreed to post final
> > > version within a week but obviously he's too busy to do so. I was away
> > > so I didn't follow closely. Eventually it fell through the crack. :-(
> > 
> > I think I prefer fixing the counting for backporting to stable kernels.
> 
> That's a good argument. I think we should take this patch, or something
> very like it, now and then rebase the more complex thing on top.
> 

I take this as sort-of-ack by the maintainer.

David, please go ahead with your fix. I will post the other approach as
an improvement later on.

Wei.

> >  Xi's approach of packing the ring differently is a change in frontend
> > visible behaviour and seems more risky. e.g., possible performance
> > impact so I would like to see some performance analysis of that approach.
> 
> Yes.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2013-09-09  9:20 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-03 17:29 [PATCH] xen-netback: count number required slots for an skb more carefully David Vrabel
2013-09-03 21:53 ` Wei Liu
2013-09-04  2:25   ` annie li
2013-09-04  6:04     ` Matt Wilson
2013-09-04  6:56       ` [Xen-devel] " annie li
2013-09-04  7:38         ` Wei Liu
2013-09-04  8:06           ` annie li
2013-09-04  8:20             ` Wei Liu
2013-09-04  8:36       ` Ian Campbell
2013-09-04  8:41         ` Wei Liu
2013-09-04 11:48   ` [PATCH] " David Vrabel
2013-09-04 12:41     ` Ian Campbell
2013-09-04 13:35       ` Wei Liu
2013-09-09  9:20       ` Wei Liu
2013-09-04 13:14     ` Wei Liu
2013-09-04 14:02       ` David Vrabel
2013-09-04 15:44         ` Wei Liu
2013-09-05 10:12           ` David Vrabel
2013-09-05 10:27             ` Wei Liu
2013-09-04 18:48 ` David Miller
2013-09-05  7:35   ` Ian Campbell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).