netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
@ 2014-05-16 11:08 Wei Liu
  2014-05-16 13:04 ` Eric Dumazet
  0 siblings, 1 reply; 23+ messages in thread
From: Wei Liu @ 2014-05-16 11:08 UTC (permalink / raw)
  To: netdev, xen-devel
  Cc: Wei Liu, David Vrabel, Konrad Wilk, Boris Ostrovsky, Stefan Bader,
	Zoltan Kiss

Some workload, such as Redis can generate SKBs which make use of
compound pages. Netfront doesn't quite like that because it doesn't want
to send packet that occupies exessive slots to the backend as backend
might deem it malicious. On the flip side these packets are actually
legit, the size check at the beginning of xennet_start_xmit ensures that
packet size is below 64K.

So we linearize SKB if it occupies too many slots. If the linearization
fails then the SKB is dropped.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netfront.c |   17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 895355d..b378dcd 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -573,9 +573,20 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
 		xennet_count_skb_frag_slots(skb);
 	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
-		net_alert_ratelimited(
-			"xennet: skb rides the rocket: %d slots\n", slots);
-		goto drop;
+		if (skb_linearize(skb)) {
+			net_alert_ratelimited(
+				"xennet: failed to linearize skb, skb dropped\n");
+			goto drop;
+		}
+		data = skb->data;
+		offset = offset_in_page(data);
+		len = skb_headlen(skb);
+		slots = DIV_ROUND_UP(offset + len, PAGE_SIZE);
+		if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
+			net_alert_ratelimited(
+				"xennet: still too many slots after linerization: %d", slots);
+			goto drop;
+		}
 	}
 
 	spin_lock_irqsave(&np->tx_lock, flags);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
  2014-05-16 11:08 [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots Wei Liu
@ 2014-05-16 13:04 ` Eric Dumazet
  2014-05-16 13:11   ` Wei Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Dumazet @ 2014-05-16 13:04 UTC (permalink / raw)
  To: Wei Liu
  Cc: netdev, xen-devel, David Vrabel, Konrad Wilk, Boris Ostrovsky,
	Stefan Bader, Zoltan Kiss

On Fri, 2014-05-16 at 12:08 +0100, Wei Liu wrote:
> Some workload, such as Redis can generate SKBs which make use of
> compound pages. Netfront doesn't quite like that because it doesn't want
> to send packet that occupies exessive slots to the backend as backend
> might deem it malicious. On the flip side these packets are actually
> legit, the size check at the beginning of xennet_start_xmit ensures that
> packet size is below 64K.
> 
> So we linearize SKB if it occupies too many slots. If the linearization
> fails then the SKB is dropped.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> Cc: Konrad Wilk <konrad.wilk@oracle.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Stefan Bader <stefan.bader@canonical.com>
> Cc: Zoltan Kiss <zoltan.kiss@citrix.com>
> ---
>  drivers/net/xen-netfront.c |   17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)

This is likely to fail on typical host.

What about adding a smart helper trying to aggregate consecutive
smallest fragments into a single frag ?

This would be needed for bnx2x for example as well.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
  2014-05-16 13:04 ` Eric Dumazet
@ 2014-05-16 13:11   ` Wei Liu
  2014-05-16 14:21     ` Eric Dumazet
  0 siblings, 1 reply; 23+ messages in thread
From: Wei Liu @ 2014-05-16 13:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Wei Liu, netdev, xen-devel, David Vrabel, Konrad Wilk,
	Boris Ostrovsky, Stefan Bader, Zoltan Kiss

On Fri, May 16, 2014 at 06:04:34AM -0700, Eric Dumazet wrote:
> On Fri, 2014-05-16 at 12:08 +0100, Wei Liu wrote:
> > Some workload, such as Redis can generate SKBs which make use of
> > compound pages. Netfront doesn't quite like that because it doesn't want
> > to send packet that occupies exessive slots to the backend as backend
> > might deem it malicious. On the flip side these packets are actually
> > legit, the size check at the beginning of xennet_start_xmit ensures that
> > packet size is below 64K.
> > 
> > So we linearize SKB if it occupies too many slots. If the linearization
> > fails then the SKB is dropped.
> > 
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > Cc: David Vrabel <david.vrabel@citrix.com>
> > Cc: Konrad Wilk <konrad.wilk@oracle.com>
> > Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> > Cc: Stefan Bader <stefan.bader@canonical.com>
> > Cc: Zoltan Kiss <zoltan.kiss@citrix.com>
> > ---
> >  drivers/net/xen-netfront.c |   17 ++++++++++++++---
> >  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> This is likely to fail on typical host.
> 

It's not that common to trigger this, I only saw a few reports. In fact
Stefan's report is the first one that comes with a method to reproduce
it.

I tested with redis-benchmark on a guest with 256MB RAM and only saw a
few "failed to linearize", never saw a single one with 1GB guest.

> What about adding a smart helper trying to aggregate consecutive
> smallest fragments into a single frag ?
> 

Ideally this is a better apporach, but I'm afraid I won't be able to
look into this until early / mid June.

Wei.

> This would be needed for bnx2x for example as well.
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
  2014-05-16 13:11   ` Wei Liu
@ 2014-05-16 14:21     ` Eric Dumazet
  2014-05-16 14:36       ` Wei Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Dumazet @ 2014-05-16 14:21 UTC (permalink / raw)
  To: Wei Liu
  Cc: netdev, xen-devel, David Vrabel, Konrad Wilk, Boris Ostrovsky,
	Stefan Bader, Zoltan Kiss

On Fri, 2014-05-16 at 14:11 +0100, Wei Liu wrote:

> It's not that common to trigger this, I only saw a few reports. In fact
> Stefan's report is the first one that comes with a method to reproduce
> it.
> 
> I tested with redis-benchmark on a guest with 256MB RAM and only saw a
> few "failed to linearize", never saw a single one with 1GB guest.

Well, I am just saying. This is asking order-5 allocations, and yes,
this is going to fail after few days of uptime, no matter what you try.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
  2014-05-16 14:21     ` Eric Dumazet
@ 2014-05-16 14:36       ` Wei Liu
  2014-05-16 15:22         ` Eric Dumazet
  0 siblings, 1 reply; 23+ messages in thread
From: Wei Liu @ 2014-05-16 14:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Wei Liu, netdev, xen-devel, David Vrabel, Konrad Wilk,
	Boris Ostrovsky, Stefan Bader, Zoltan Kiss

On Fri, May 16, 2014 at 07:21:08AM -0700, Eric Dumazet wrote:
> On Fri, 2014-05-16 at 14:11 +0100, Wei Liu wrote:
> 
> > It's not that common to trigger this, I only saw a few reports. In fact
> > Stefan's report is the first one that comes with a method to reproduce
> > it.
> > 
> > I tested with redis-benchmark on a guest with 256MB RAM and only saw a
> > few "failed to linearize", never saw a single one with 1GB guest.
> 
> Well, I am just saying. This is asking order-5 allocations, and yes,
> this is going to fail after few days of uptime, no matter what you try.
> 

Hmm... I see what you mean -- memory fragmentation leads to allocation
failure. Thanks.

> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
  2014-05-16 14:36       ` Wei Liu
@ 2014-05-16 15:22         ` Eric Dumazet
  2014-05-16 15:34           ` Wei Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Dumazet @ 2014-05-16 15:22 UTC (permalink / raw)
  To: Wei Liu
  Cc: netdev, xen-devel, David Vrabel, Konrad Wilk, Boris Ostrovsky,
	Stefan Bader, Zoltan Kiss

On Fri, 2014-05-16 at 15:36 +0100, Wei Liu wrote:
> On Fri, May 16, 2014 at 07:21:08AM -0700, Eric Dumazet wrote:
> > On Fri, 2014-05-16 at 14:11 +0100, Wei Liu wrote:
> > 
> > > It's not that common to trigger this, I only saw a few reports. In fact
> > > Stefan's report is the first one that comes with a method to reproduce
> > > it.
> > > 
> > > I tested with redis-benchmark on a guest with 256MB RAM and only saw a
> > > few "failed to linearize", never saw a single one with 1GB guest.
> > 
> > Well, I am just saying. This is asking order-5 allocations, and yes,
> > this is going to fail after few days of uptime, no matter what you try.
> > 
> 
> Hmm... I see what you mean -- memory fragmentation leads to allocation
> failure. Thanks.

In the mean time, have you tried to lower gso_max_size ?

Setting it witk netif_set_gso_max_size() to something like 56000 might
avoid the problem.

(Not sure if it is applicable in your case)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
  2014-05-16 15:22         ` Eric Dumazet
@ 2014-05-16 15:34           ` Wei Liu
  2014-05-16 16:29             ` Zoltan Kiss
  0 siblings, 1 reply; 23+ messages in thread
From: Wei Liu @ 2014-05-16 15:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Wei Liu, netdev, xen-devel, David Vrabel, Konrad Wilk,
	Boris Ostrovsky, Stefan Bader, Zoltan Kiss

On Fri, May 16, 2014 at 08:22:19AM -0700, Eric Dumazet wrote:
> On Fri, 2014-05-16 at 15:36 +0100, Wei Liu wrote:
> > On Fri, May 16, 2014 at 07:21:08AM -0700, Eric Dumazet wrote:
> > > On Fri, 2014-05-16 at 14:11 +0100, Wei Liu wrote:
> > > 
> > > > It's not that common to trigger this, I only saw a few reports. In fact
> > > > Stefan's report is the first one that comes with a method to reproduce
> > > > it.
> > > > 
> > > > I tested with redis-benchmark on a guest with 256MB RAM and only saw a
> > > > few "failed to linearize", never saw a single one with 1GB guest.
> > > 
> > > Well, I am just saying. This is asking order-5 allocations, and yes,
> > > this is going to fail after few days of uptime, no matter what you try.
> > > 
> > 
> > Hmm... I see what you mean -- memory fragmentation leads to allocation
> > failure. Thanks.
> 
> In the mean time, have you tried to lower gso_max_size ?
> 
> Setting it witk netif_set_gso_max_size() to something like 56000 might
> avoid the problem.
> 
> (Not sure if it is applicable in your case)
> 

It works, at least in this Redis testcase. Could you explain a bit where
this 56000 magic number comes from? :-)

Presumably I can derive it from some constant in core network code?

Wei.

> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
  2014-05-16 15:34           ` Wei Liu
@ 2014-05-16 16:29             ` Zoltan Kiss
  2014-05-16 16:47               ` Eric Dumazet
                                 ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Zoltan Kiss @ 2014-05-16 16:29 UTC (permalink / raw)
  To: Wei Liu, Eric Dumazet
  Cc: netdev, xen-devel, David Vrabel, Konrad Wilk, Boris Ostrovsky,
	Stefan Bader

On 16/05/14 16:34, Wei Liu wrote:
> On Fri, May 16, 2014 at 08:22:19AM -0700, Eric Dumazet wrote:
>> On Fri, 2014-05-16 at 15:36 +0100, Wei Liu wrote:
>>> On Fri, May 16, 2014 at 07:21:08AM -0700, Eric Dumazet wrote:
>>>> On Fri, 2014-05-16 at 14:11 +0100, Wei Liu wrote:
>>>>
>>>>> It's not that common to trigger this, I only saw a few reports. In fact
>>>>> Stefan's report is the first one that comes with a method to reproduce
>>>>> it.
>>>>>
>>>>> I tested with redis-benchmark on a guest with 256MB RAM and only saw a
>>>>> few "failed to linearize", never saw a single one with 1GB guest.
>>>>
>>>> Well, I am just saying. This is asking order-5 allocations, and yes,
>>>> this is going to fail after few days of uptime, no matter what you try.
>>>>
>>>
>>> Hmm... I see what you mean -- memory fragmentation leads to allocation
>>> failure. Thanks.
>>
>> In the mean time, have you tried to lower gso_max_size ?
>>
>> Setting it witk netif_set_gso_max_size() to something like 56000 might
>> avoid the problem.
>>
>> (Not sure if it is applicable in your case)
>>
> 
> It works, at least in this Redis testcase. Could you explain a bit where
> this 56000 magic number comes from? :-)
> 
> Presumably I can derive it from some constant in core network code?

I guess it just makes more unlikely to have packets with problematic layout. But the following packet would still fail:
linear buffer : 80 bytes, on 2 pages
17 frags, 80 bytes each, each spanning over page boundary.

I just had an idea: a modified version of xenvif_handle_frag_list function from netback would be useful for us here. It recreates the frags array on fully utilized 4k pages. Plus we can use pskb_expand_head to reduce the page number on the linear buffer (although it might not work, see my comment in the patch)
The worst case linear buffer then spans N+1 pages, and has N*PAGE_SIZE+1 bytes. Then the frags after this coalescing should have 16*PAGE_SIZE - (N*PAGE_SIZE+2) bytes at most, which is 16-N pages. Altogether that's 16+1 page, which should definitely fit!
This is what I mean:

8<--------------
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 158b5e6..b1133d6 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -544,6 +544,73 @@ static int xennet_count_skb_frag_slots(struct sk_buff *skb)
 	return pages;
 }
 
+int xenvif_reduce_pages(struct sk_buff *skb, int target)
+{
+	unsigned int offset = skb_headlen(skb);
+	skb_frag_t frags[MAX_SKB_FRAGS];
+	int newfrags, oldfrags;
+	unsigned int pages, optimal;
+
+	BUG_ON(!target);
+
+	pages = DIV_ROUND_UP(offset_in_page(skb->data) + skb_headlen(skb), PAGE_SIZE);
+	optimal = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);
+	if (pages - optimal) {
+		int err;
+/* FIXME: we should check if pskb_expand_head really allocates on page boundary,
+ *  otherwise we can still have suboptimal page layout */
+		if (unlikely(err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC)))
+			return err;
+		target -= pages - optimal;
+		if (!target)
+			return 0;
+	}
+
+	/* Subtract frags size, we will correct it later */
+	skb->truesize -= skb->data_len;
+
+	/* Create a brand new frags array and coalesce there */
+	for (newfrags = 0; offset < skb->len; newfrags++) {
+		struct page *page;
+		unsigned int len;
+
+		BUG_ON(newfrags >= MAX_SKB_FRAGS);
+		page = alloc_page(GFP_ATOMIC);
+		if (!page) {
+			int j;
+			skb->truesize += skb->data_len;
+			for (j = 0; j < newfrags; j++)
+				put_page(frags[j].page.p);
+			return -ENOMEM;
+		}
+
+		if (offset + PAGE_SIZE < skb->len)
+			len = PAGE_SIZE;
+		else
+			len = skb->len - offset;
+		if (skb_copy_bits(skb, offset, page_address(page), len))
+			BUG();
+
+		offset += len;
+		frags[newfrags].page.p = page;
+		frags[newfrags].page_offset = 0;
+		skb_frag_size_set(&frags[newfrags], len);
+	}
+
+	/* Drop the original buffers */
+	for (oldfrags = 0; oldfrags < skb_shinfo(skb)->nr_frags; oldfrags++)
+		skb_frag_unref(skb, oldfrags);
+
+	/* Swap the new frags array with the old one */
+	memcpy(skb_shinfo(skb)->frags,
+	       frags,
+	       newfrags * sizeof(skb_frag_t));
+	skb_shinfo(skb)->nr_frags = newfrags;
+	/* Correct truesize */
+	skb->truesize += newfrags * PAGE_SIZE;
+	return 0;
+}
+
 static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	unsigned short id;
@@ -573,11 +640,21 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
 		xennet_count_skb_frag_slots(skb);
 	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
-		net_alert_ratelimited(
-			"xennet: skb rides the rocket: %d slots\n", slots);
-		goto drop;
+		if (unlikely(xenvif_reduce_pages(skb, slots - MAX_SKB_FRAGS - 1))) {
+			net_alert_ratelimited(
+				"xennet: couldn't reduce slot number from %d\n", slots);
+			goto drop;
+		}
+		slots = DIV_ROUND_UP(offset_in_page(data) + len, PAGE_SIZE) +
+			xennet_count_skb_frag_slots(skb);
+		if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
+			net_alert_ratelimited(
+				"xennet: slot reduction doesn't work, slots: %d\n", slots);
+			goto drop;
+		}
 	}
 
+
 	spin_lock_irqsave(&np->tx_lock, flags);
 
 	if (unlikely(!netif_carrier_ok(dev) ||

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
  2014-05-16 16:29             ` Zoltan Kiss
@ 2014-05-16 16:47               ` Eric Dumazet
  2014-05-16 16:51                 ` Zoltan Kiss
  2014-05-16 16:54               ` Wei Liu
  2014-05-30  8:06               ` Stefan Bader
  2 siblings, 1 reply; 23+ messages in thread
From: Eric Dumazet @ 2014-05-16 16:47 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: Wei Liu, netdev, xen-devel, David Vrabel, Konrad Wilk,
	Boris Ostrovsky, Stefan Bader

On Fri, 2014-05-16 at 17:29 +0100, Zoltan Kiss wrote:
> On 16/05/14 16:34, Wei Liu wrote:
> > 
> > It works, at least in this Redis testcase. Could you explain a bit where
> > this 56000 magic number comes from? :-)
> > 
> > Presumably I can derive it from some constant in core network code?
> 
> I guess it just makes more unlikely to have packets with problematic layout. But the following packet would still fail:
> linear buffer : 80 bytes, on 2 pages
> 17 frags, 80 bytes each, each spanning over page boundary.

How would you build such skbs ? Its _very_ difficult, you have to be
very very smart to hit this.

Also reducing gso_max_size made sure order-5 allocations would not be
attempted in this unlikely case.

56000 + overhead is below 65536 -> order-4 allocations at most.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
  2014-05-16 16:47               ` Eric Dumazet
@ 2014-05-16 16:51                 ` Zoltan Kiss
  2014-05-16 17:00                   ` Eric Dumazet
  0 siblings, 1 reply; 23+ messages in thread
From: Zoltan Kiss @ 2014-05-16 16:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Wei Liu, netdev, xen-devel, David Vrabel, Konrad Wilk,
	Boris Ostrovsky, Stefan Bader

On 16/05/14 17:47, Eric Dumazet wrote:
> On Fri, 2014-05-16 at 17:29 +0100, Zoltan Kiss wrote:
>> On 16/05/14 16:34, Wei Liu wrote:
>>>
>>> It works, at least in this Redis testcase. Could you explain a bit where
>>> this 56000 magic number comes from? :-)
>>>
>>> Presumably I can derive it from some constant in core network code?
>>
>> I guess it just makes more unlikely to have packets with problematic layout. But the following packet would still fail:
>> linear buffer : 80 bytes, on 2 pages
>> 17 frags, 80 bytes each, each spanning over page boundary.
>
> How would you build such skbs ? Its _very_ difficult, you have to be
> very very smart to hit this.
I wouldn't build such skbs, I would expect the network stack to create 
such weird things sometimes :)
The goal here is to prepare and handle the worst case scenarios as well.
>
> Also reducing gso_max_size made sure order-5 allocations would not be
> attempted in this unlikely case.
But reducing the gso_max_size would have a bad impact on the general 
network throughput, right?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
  2014-05-16 16:29             ` Zoltan Kiss
  2014-05-16 16:47               ` Eric Dumazet
@ 2014-05-16 16:54               ` Wei Liu
  2014-05-19 16:47                 ` Zoltan Kiss
  2014-05-30  8:06               ` Stefan Bader
  2 siblings, 1 reply; 23+ messages in thread
From: Wei Liu @ 2014-05-16 16:54 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: Wei Liu, Eric Dumazet, netdev, xen-devel, David Vrabel,
	Konrad Wilk, Boris Ostrovsky, Stefan Bader

On Fri, May 16, 2014 at 05:29:05PM +0100, Zoltan Kiss wrote:
> On 16/05/14 16:34, Wei Liu wrote:
> > On Fri, May 16, 2014 at 08:22:19AM -0700, Eric Dumazet wrote:
> >> On Fri, 2014-05-16 at 15:36 +0100, Wei Liu wrote:
> >>> On Fri, May 16, 2014 at 07:21:08AM -0700, Eric Dumazet wrote:
> >>>> On Fri, 2014-05-16 at 14:11 +0100, Wei Liu wrote:
> >>>>
> >>>>> It's not that common to trigger this, I only saw a few reports. In fact
> >>>>> Stefan's report is the first one that comes with a method to reproduce
> >>>>> it.
> >>>>>
> >>>>> I tested with redis-benchmark on a guest with 256MB RAM and only saw a
> >>>>> few "failed to linearize", never saw a single one with 1GB guest.
> >>>>
> >>>> Well, I am just saying. This is asking order-5 allocations, and yes,
> >>>> this is going to fail after few days of uptime, no matter what you try.
> >>>>
> >>>
> >>> Hmm... I see what you mean -- memory fragmentation leads to allocation
> >>> failure. Thanks.
> >>
> >> In the mean time, have you tried to lower gso_max_size ?
> >>
> >> Setting it witk netif_set_gso_max_size() to something like 56000 might
> >> avoid the problem.
> >>
> >> (Not sure if it is applicable in your case)
> >>
> > 
> > It works, at least in this Redis testcase. Could you explain a bit where
> > this 56000 magic number comes from? :-)
> > 
> > Presumably I can derive it from some constant in core network code?
> 
> I guess it just makes more unlikely to have packets with problematic layout. But the following packet would still fail:
> linear buffer : 80 bytes, on 2 pages
> 17 frags, 80 bytes each, each spanning over page boundary.
> 

Presumably max GSO size affects packet layout, that's what I was trying
to figure out.

> I just had an idea: a modified version of xenvif_handle_frag_list function from netback would be useful for us here. It recreates the frags array on fully utilized 4k pages. Plus we can use pskb_expand_head to reduce the page number on the linear buffer (although it might not work, see my comment in the patch)
> The worst case linear buffer then spans N+1 pages, and has N*PAGE_SIZE+1 bytes. Then the frags after this coalescing should have 16*PAGE_SIZE - (N*PAGE_SIZE+2) bytes at most, which is 16-N pages. Altogether that's 16+1 page, which should definitely fit!
> This is what I mean:
> 
> 8<--------------
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 158b5e6..b1133d6 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -544,6 +544,73 @@ static int xennet_count_skb_frag_slots(struct sk_buff *skb)
>  	return pages;
>  }
>  
> +int xenvif_reduce_pages(struct sk_buff *skb, int target)
> +{
> +	unsigned int offset = skb_headlen(skb);
> +	skb_frag_t frags[MAX_SKB_FRAGS];
> +	int newfrags, oldfrags;
> +	unsigned int pages, optimal;
> +
> +	BUG_ON(!target);
> +
> +	pages = DIV_ROUND_UP(offset_in_page(skb->data) + skb_headlen(skb), PAGE_SIZE);
> +	optimal = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);
> +	if (pages - optimal) {
> +		int err;
> +/* FIXME: we should check if pskb_expand_head really allocates on page boundary,
> + *  otherwise we can still have suboptimal page layout */
> +		if (unlikely(err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC)))

I'm a bit lost. What do you expect from the call to pskb_expand_head?

I'm sorry I cannot see immediate result from the comment of
pskb_expand_head. If you call with nhead and ntail equal to 0 it creates
identical copy, but I don't see guarantee on page alignment. Did I miss
something?

> +			return err;
> +		target -= pages - optimal;
> +		if (!target)
> +			return 0;
> +	}
> +
> +	/* Subtract frags size, we will correct it later */
> +	skb->truesize -= skb->data_len;
> +
> +	/* Create a brand new frags array and coalesce there */
> +	for (newfrags = 0; offset < skb->len; newfrags++) {
> +		struct page *page;
> +		unsigned int len;
> +
> +		BUG_ON(newfrags >= MAX_SKB_FRAGS);
> +		page = alloc_page(GFP_ATOMIC);

And the ammount of memory allocation is a bit overkill I think (though
it's still better than the order-5 allocation in skb_linearize). Can you
not just memmove all paged data to first few frags and release other
frags?

Anyway, this method might still work, just a bit overkill IMHO.

Wei.

> +		if (!page) {
> +			int j;
> +			skb->truesize += skb->data_len;
> +			for (j = 0; j < newfrags; j++)
> +				put_page(frags[j].page.p);
> +			return -ENOMEM;
> +		}
> +
> +		if (offset + PAGE_SIZE < skb->len)
> +			len = PAGE_SIZE;
> +		else
> +			len = skb->len - offset;
> +		if (skb_copy_bits(skb, offset, page_address(page), len))
> +			BUG();
> +
> +		offset += len;
> +		frags[newfrags].page.p = page;
> +		frags[newfrags].page_offset = 0;
> +		skb_frag_size_set(&frags[newfrags], len);
> +	}
> +
> +	/* Drop the original buffers */
> +	for (oldfrags = 0; oldfrags < skb_shinfo(skb)->nr_frags; oldfrags++)
> +		skb_frag_unref(skb, oldfrags);
> +
> +	/* Swap the new frags array with the old one */
> +	memcpy(skb_shinfo(skb)->frags,
> +	       frags,
> +	       newfrags * sizeof(skb_frag_t));
> +	skb_shinfo(skb)->nr_frags = newfrags;
> +	/* Correct truesize */
> +	skb->truesize += newfrags * PAGE_SIZE;
> +	return 0;
> +}
> +
>  static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
>  	unsigned short id;
> @@ -573,11 +640,21 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>  		xennet_count_skb_frag_slots(skb);
>  	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> -		net_alert_ratelimited(
> -			"xennet: skb rides the rocket: %d slots\n", slots);
> -		goto drop;
> +		if (unlikely(xenvif_reduce_pages(skb, slots - MAX_SKB_FRAGS - 1))) {
> +			net_alert_ratelimited(
> +				"xennet: couldn't reduce slot number from %d\n", slots);
> +			goto drop;
> +		}
> +		slots = DIV_ROUND_UP(offset_in_page(data) + len, PAGE_SIZE) +
> +			xennet_count_skb_frag_slots(skb);
> +		if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> +			net_alert_ratelimited(
> +				"xennet: slot reduction doesn't work, slots: %d\n", slots);
> +			goto drop;
> +		}
>  	}
>  
> +
>  	spin_lock_irqsave(&np->tx_lock, flags);
>  
>  	if (unlikely(!netif_carrier_ok(dev) ||

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
  2014-05-16 16:51                 ` Zoltan Kiss
@ 2014-05-16 17:00                   ` Eric Dumazet
  0 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2014-05-16 17:00 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: Wei Liu, netdev, xen-devel, David Vrabel, Konrad Wilk,
	Boris Ostrovsky, Stefan Bader

On Fri, 2014-05-16 at 17:51 +0100, Zoltan Kiss wrote:

> But reducing the gso_max_size would have a bad impact on the general 
> network throughput, right?

Not really.

Sending TSO packets with 44 MSS in them instead of 45 might in fact help
a bit. I never could detect any difference.

LRO/GRO receivers aggregate at most 17 or 22 segments anyway.

I said 56000 only as a quick way to test the thing

If you want an explicit formula, it might be :

SKB_WITH_OVERHEAD(65535 - LL_MAX_HEADER) or something like that...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
  2014-05-16 16:54               ` Wei Liu
@ 2014-05-19 16:47                 ` Zoltan Kiss
  0 siblings, 0 replies; 23+ messages in thread
From: Zoltan Kiss @ 2014-05-19 16:47 UTC (permalink / raw)
  To: Wei Liu
  Cc: Eric Dumazet, netdev, xen-devel, David Vrabel, Konrad Wilk,
	Boris Ostrovsky, Stefan Bader

On 16/05/14 17:54, Wei Liu wrote:
>> I just had an idea: a modified version of xenvif_handle_frag_list function from netback would be useful for us here. It recreates the frags array on fully utilized 4k pages. Plus we can use pskb_expand_head to reduce the page number on the linear buffer (although it might not work, see my comment in the patch)
>> The worst case linear buffer then spans N+1 pages, and has N*PAGE_SIZE+1 bytes. Then the frags after this coalescing should have 16*PAGE_SIZE - (N*PAGE_SIZE+2) bytes at most, which is 16-N pages. Altogether that's 16+1 page, which should definitely fit!
>> This is what I mean:
>>
>> 8<--------------
>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> index 158b5e6..b1133d6 100644
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -544,6 +544,73 @@ static int xennet_count_skb_frag_slots(struct sk_buff *skb)
>>   	return pages;
>>   }
>>
>> +int xenvif_reduce_pages(struct sk_buff *skb, int target)
>> +{
>> +	unsigned int offset = skb_headlen(skb);
>> +	skb_frag_t frags[MAX_SKB_FRAGS];
>> +	int newfrags, oldfrags;
>> +	unsigned int pages, optimal;
>> +
>> +	BUG_ON(!target);
>> +
>> +	pages = DIV_ROUND_UP(offset_in_page(skb->data) + skb_headlen(skb), PAGE_SIZE);
>> +	optimal = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);
>> +	if (pages - optimal) {
>> +		int err;
>> +/* FIXME: we should check if pskb_expand_head really allocates on page boundary,
>> + *  otherwise we can still have suboptimal page layout */
>> +		if (unlikely(err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC)))
>
> I'm a bit lost. What do you expect from the call to pskb_expand_head?
>
> I'm sorry I cannot see immediate result from the comment of
> pskb_expand_head. If you call with nhead and ntail equal to 0 it creates
> identical copy, but I don't see guarantee on page alignment. Did I miss
> something?
Yep, indeed, it doesn't guarantee directly that new allocation won't 
span across page boundaries unnecessarily. And actually we are still OK, 
as the skb shouldn't be more than 18 slots, so netback should be able to 
handle that.

>
>> +			return err;
>> +		target -= pages - optimal;
>> +		if (!target)
>> +			return 0;
>> +	}
>> +
>> +	/* Subtract frags size, we will correct it later */
>> +	skb->truesize -= skb->data_len;
>> +
>> +	/* Create a brand new frags array and coalesce there */
>> +	for (newfrags = 0; offset < skb->len; newfrags++) {
>> +		struct page *page;
>> +		unsigned int len;
>> +
>> +		BUG_ON(newfrags >= MAX_SKB_FRAGS);
>> +		page = alloc_page(GFP_ATOMIC);
>
> And the ammount of memory allocation is a bit overkill I think (though
> it's still better than the order-5 allocation in skb_linearize). Can you
> not just memmove all paged data to first few frags and release other
> frags?
>
> Anyway, this method might still work, just a bit overkill IMHO.

Yep, it's quite suboptimal, and anyone can come up with a better (and 
probably more complex) solution, however:
- this should be a rarely used thing, so performance doesn't matter that 
much at the moment (however who knows under which workload you can end 
up with skbs often fragmented so badly that you see this function called 
...)
- it would be good to create a fix for this soon, and let it backported 
to major distro kernels where compound pages are enabled

Zoli

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
  2014-05-16 16:29             ` Zoltan Kiss
  2014-05-16 16:47               ` Eric Dumazet
  2014-05-16 16:54               ` Wei Liu
@ 2014-05-30  8:06               ` Stefan Bader
  2014-05-30 12:07                 ` Zoltan Kiss
  2014-05-30 12:11                 ` Wei Liu
  2 siblings, 2 replies; 23+ messages in thread
From: Stefan Bader @ 2014-05-30  8:06 UTC (permalink / raw)
  To: Zoltan Kiss, Wei Liu, Eric Dumazet
  Cc: netdev, xen-devel, David Vrabel, Konrad Wilk, Boris Ostrovsky

[-- Attachment #1: Type: text/plain, Size: 6034 bytes --]

On 16.05.2014 18:29, Zoltan Kiss wrote:
> On 16/05/14 16:34, Wei Liu wrote:
>> On Fri, May 16, 2014 at 08:22:19AM -0700, Eric Dumazet wrote:
>>> On Fri, 2014-05-16 at 15:36 +0100, Wei Liu wrote:
>>>> On Fri, May 16, 2014 at 07:21:08AM -0700, Eric Dumazet wrote:
>>>>> On Fri, 2014-05-16 at 14:11 +0100, Wei Liu wrote:
>>>>>
>>>>>> It's not that common to trigger this, I only saw a few reports. In fact
>>>>>> Stefan's report is the first one that comes with a method to reproduce
>>>>>> it.
>>>>>>
>>>>>> I tested with redis-benchmark on a guest with 256MB RAM and only saw a
>>>>>> few "failed to linearize", never saw a single one with 1GB guest.
>>>>>
>>>>> Well, I am just saying. This is asking order-5 allocations, and yes,
>>>>> this is going to fail after few days of uptime, no matter what you try.
>>>>>
>>>>
>>>> Hmm... I see what you mean -- memory fragmentation leads to allocation
>>>> failure. Thanks.
>>>
>>> In the mean time, have you tried to lower gso_max_size ?
>>>
>>> Setting it witk netif_set_gso_max_size() to something like 56000 might
>>> avoid the problem.
>>>
>>> (Not sure if it is applicable in your case)
>>>
>>
>> It works, at least in this Redis testcase. Could you explain a bit where
>> this 56000 magic number comes from? :-)
>>
>> Presumably I can derive it from some constant in core network code?
> 
> I guess it just makes more unlikely to have packets with problematic layout. But the following packet would still fail:
> linear buffer : 80 bytes, on 2 pages
> 17 frags, 80 bytes each, each spanning over page boundary.
> 
> I just had an idea: a modified version of xenvif_handle_frag_list function from netback would be useful for us here. It recreates the frags array on fully utilized 4k pages. Plus we can use pskb_expand_head to reduce the page number on the linear buffer (although it might not work, see my comment in the patch)
> The worst case linear buffer then spans N+1 pages, and has N*PAGE_SIZE+1 bytes. Then the frags after this coalescing should have 16*PAGE_SIZE - (N*PAGE_SIZE+2) bytes at most, which is 16-N pages. Altogether that's 16+1 page, which should definitely fit!
> This is what I mean:
>

I had been idly wondering about this onwards. And trying to understand the whole
skb handling environment, I tried to come up with some idea as well. It may be
totally stupid and using the wrong assumptions. It seems to work in the sense
that things did not blow up into my face immediately and somehow I did not see
dropped packages due to the number of slots either.
But again, I am not sure I am doing the right thing. The idea was to just try to
get rid of so many compound pages (which I believe are the only ones that can
have an offset big enough to allow some alignment savings)...

-Stefan


From 8571b106643b32296e58526e2fbe97c330877ac8 Mon Sep 17 00:00:00 2001
From: Stefan Bader <stefan.bader@canonical.com>
Date: Thu, 29 May 2014 12:18:01 +0200
Subject: [PATCH] xen-netfront: Align frags to fit max slots

In cases where the frags in a skb require more than MAX_SKB_FRAGS + 1
(= 18) 4K pages of grant pages, try to reduce the footprint by moving
the data to new pages and have it aligned to the beginning.
Then replace the page in the frag and release the old one. This sure is
more expensive in compute but should happen not too often and sounds
better than to just drop the packet in that case.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
---
 drivers/net/xen-netfront.c | 65 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 62 insertions(+), 3 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 158b5e6..ad71e5c 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -544,6 +544,61 @@ static int xennet_count_skb_frag_slots(struct sk_buff *skb)
 	return pages;
 }

+/*
+ * Align data to new pages in order to save slots required to
+ * transmit this buffer.
+ * @skb - socket buffer
+ * @target - number of pages to save
+ * returns the number of pages the fragments have been reduced of
+ */
+static int xennet_align_frags(struct sk_buff *skb, int target)
+{
+	int i, frags = skb_shinfo(skb)->nr_frags;
+	int reduced = 0;
+
+	for (i = 0; i < frags; i++) {
+		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
+		struct page *fpage = skb_frag_page(frag);
+		struct page *npage;
+		unsigned long size;
+		unsigned long offset;
+		gfp_t gfp;
+		int order;
+
+		if (!PageCompound(fpage))
+			continue;
+
+		size = skb_frag_size(frag);
+		offset = frag->page_offset & ~PAGE_MASK;
+
+		/*
+		 * If the length of data in the last subpage of a compound
+		 * page is smaller than the offset into the first data sub-
+		 * page, we can save a subpage by copying data around.
+		 */
+		if ( ((offset + size) & ~PAGE_MASK) > offset )
+			continue;
+
+		gfp = GFP_ATOMIC | __GFP_COLD;
+		order = PFN_UP(size);
+		if (order)
+			gfp |= __GFP_COMP | __GFP_NOWARN;
+
+		npage = alloc_pages(gfp, order);
+		if (!npage)
+			break;
+		memcpy(page_address(npage), skb_frag_address(frag), size);
+		frag->page.p = npage;
+		frag->page_offset = 0;
+		put_page(fpage);
+
+		if (++reduced >= target)
+			break;
+	}
+
+	return reduced;
+}
+
 static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	unsigned short id;
@@ -573,9 +628,13 @@ static int xennet_start_xmit(struct sk_buff *skb, struct
net_device *dev)
 	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
 		xennet_count_skb_frag_slots(skb);
 	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
-		net_alert_ratelimited(
-			"xennet: skb rides the rocket: %d slots\n", slots);
-		goto drop;
+		slots -= xennet_align_frags(skb, slots - (MAX_SKB_FRAGS + 1));
+		if (slots > MAX_SKB_FRAGS + 1) {
+			net_alert_ratelimited(
+				"xennet: skb rides the rocket: %d slots\n",
+				slots);
+			goto drop;
+		}
 	}

 	spin_lock_irqsave(&np->tx_lock, flags);
-- 
1.9.1




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
  2014-05-30  8:06               ` Stefan Bader
@ 2014-05-30 12:07                 ` Zoltan Kiss
  2014-05-30 12:37                   ` Stefan Bader
  2014-07-02 12:23                   ` Stefan Bader
  2014-05-30 12:11                 ` Wei Liu
  1 sibling, 2 replies; 23+ messages in thread
From: Zoltan Kiss @ 2014-05-30 12:07 UTC (permalink / raw)
  To: Stefan Bader, Wei Liu, Eric Dumazet
  Cc: netdev, xen-devel, David Vrabel, Konrad Wilk, Boris Ostrovsky

On 30/05/14 09:06, Stefan Bader wrote:
> On 16.05.2014 18:29, Zoltan Kiss wrote:
>> On 16/05/14 16:34, Wei Liu wrote:
>>> On Fri, May 16, 2014 at 08:22:19AM -0700, Eric Dumazet wrote:
>>>> On Fri, 2014-05-16 at 15:36 +0100, Wei Liu wrote:
>>>>> On Fri, May 16, 2014 at 07:21:08AM -0700, Eric Dumazet wrote:
>>>>>> On Fri, 2014-05-16 at 14:11 +0100, Wei Liu wrote:
>>>>>>
>>>>>>> It's not that common to trigger this, I only saw a few reports. In fact
>>>>>>> Stefan's report is the first one that comes with a method to reproduce
>>>>>>> it.
>>>>>>>
>>>>>>> I tested with redis-benchmark on a guest with 256MB RAM and only saw a
>>>>>>> few "failed to linearize", never saw a single one with 1GB guest.
>>>>>> Well, I am just saying. This is asking order-5 allocations, and yes,
>>>>>> this is going to fail after few days of uptime, no matter what you try.
>>>>>>
>>>>> Hmm... I see what you mean -- memory fragmentation leads to allocation
>>>>> failure. Thanks.
>>>> In the mean time, have you tried to lower gso_max_size ?
>>>>
>>>> Setting it witk netif_set_gso_max_size() to something like 56000 might
>>>> avoid the problem.
>>>>
>>>> (Not sure if it is applicable in your case)
>>>>
>>> It works, at least in this Redis testcase. Could you explain a bit where
>>> this 56000 magic number comes from? :-)
>>>
>>> Presumably I can derive it from some constant in core network code?
>> I guess it just makes more unlikely to have packets with problematic layout. But the following packet would still fail:
>> linear buffer : 80 bytes, on 2 pages
>> 17 frags, 80 bytes each, each spanning over page boundary.
>>
>> I just had an idea: a modified version of xenvif_handle_frag_list function from netback would be useful for us here. It recreates the frags array on fully utilized 4k pages. Plus we can use pskb_expand_head to reduce the page number on the linear buffer (although it might not work, see my comment in the patch)
>> The worst case linear buffer then spans N+1 pages, and has N*PAGE_SIZE+1 bytes. Then the frags after this coalescing should have 16*PAGE_SIZE - (N*PAGE_SIZE+2) bytes at most, which is 16-N pages. Altogether that's 16+1 page, which should definitely fit!
>> This is what I mean:
>>
> I had been idly wondering about this onwards. And trying to understand the whole
> skb handling environment, I tried to come up with some idea as well. It may be
> totally stupid and using the wrong assumptions. It seems to work in the sense
> that things did not blow up into my face immediately and somehow I did not see
> dropped packages due to the number of slots either.
> But again, I am not sure I am doing the right thing. The idea was to just try to
> get rid of so many compound pages (which I believe are the only ones that can
> have an offset big enough to allow some alignment savings)...
Hi,

This probably helps in a lot of scenarios, but not for everything. E.g. 
if the skb has 15 frags, each PAGE_SIZE+1 bytes, it'll still consume 30 
slots for the frags, and this function won't be able to help that.
It's hard to come up with an algorithm which handles all the scenarios 
we can have here, and does that with the least possible amount of copy. 
I'll keep looking into this in the next weeks, but don't hesitate, if 
you have an idea!

Regads,

Zoltan
>
> -Stefan
>
>
>  From 8571b106643b32296e58526e2fbe97c330877ac8 Mon Sep 17 00:00:00 2001
> From: Stefan Bader <stefan.bader@canonical.com>
> Date: Thu, 29 May 2014 12:18:01 +0200
> Subject: [PATCH] xen-netfront: Align frags to fit max slots
>
> In cases where the frags in a skb require more than MAX_SKB_FRAGS + 1
> (= 18) 4K pages of grant pages, try to reduce the footprint by moving
> the data to new pages and have it aligned to the beginning.
> Then replace the page in the frag and release the old one. This sure is
> more expensive in compute but should happen not too often and sounds
> better than to just drop the packet in that case.
>
> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
> ---
>   drivers/net/xen-netfront.c | 65 +++++++++++++++++++++++++++++++++++++++++++---
>   1 file changed, 62 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 158b5e6..ad71e5c 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -544,6 +544,61 @@ static int xennet_count_skb_frag_slots(struct sk_buff *skb)
>   	return pages;
>   }
>
> +/*
> + * Align data to new pages in order to save slots required to
> + * transmit this buffer.
> + * @skb - socket buffer
> + * @target - number of pages to save
> + * returns the number of pages the fragments have been reduced of
> + */
> +static int xennet_align_frags(struct sk_buff *skb, int target)
> +{
> +	int i, frags = skb_shinfo(skb)->nr_frags;
> +	int reduced = 0;
> +
> +	for (i = 0; i < frags; i++) {
> +		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
> +		struct page *fpage = skb_frag_page(frag);
> +		struct page *npage;
> +		unsigned long size;
> +		unsigned long offset;
> +		gfp_t gfp;
> +		int order;
> +
> +		if (!PageCompound(fpage))
> +			continue;
> +
> +		size = skb_frag_size(frag);
> +		offset = frag->page_offset & ~PAGE_MASK;
> +
> +		/*
> +		 * If the length of data in the last subpage of a compound
> +		 * page is smaller than the offset into the first data sub-
> +		 * page, we can save a subpage by copying data around.
> +		 */
> +		if ( ((offset + size) & ~PAGE_MASK) > offset )
> +			continue;
> +
> +		gfp = GFP_ATOMIC | __GFP_COLD;
> +		order = PFN_UP(size);
> +		if (order)
> +			gfp |= __GFP_COMP | __GFP_NOWARN;
> +
> +		npage = alloc_pages(gfp, order);
> +		if (!npage)
> +			break;
> +		memcpy(page_address(npage), skb_frag_address(frag), size);
> +		frag->page.p = npage;
> +		frag->page_offset = 0;
> +		put_page(fpage);
> +
> +		if (++reduced >= target)
> +			break;
> +	}
> +
> +	return reduced;
> +}
> +
>   static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>   {
>   	unsigned short id;
> @@ -573,9 +628,13 @@ static int xennet_start_xmit(struct sk_buff *skb, struct
> net_device *dev)
>   	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>   		xennet_count_skb_frag_slots(skb);
>   	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> -		net_alert_ratelimited(
> -			"xennet: skb rides the rocket: %d slots\n", slots);
> -		goto drop;
> +		slots -= xennet_align_frags(skb, slots - (MAX_SKB_FRAGS + 1));
> +		if (slots > MAX_SKB_FRAGS + 1) {
> +			net_alert_ratelimited(
> +				"xennet: skb rides the rocket: %d slots\n",
> +				slots);
> +			goto drop;
> +		}
>   	}
>
>   	spin_lock_irqsave(&np->tx_lock, flags);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
  2014-05-30  8:06               ` Stefan Bader
  2014-05-30 12:07                 ` Zoltan Kiss
@ 2014-05-30 12:11                 ` Wei Liu
  2014-05-30 12:28                   ` Stefan Bader
  2014-05-30 12:28                   ` David Laight
  1 sibling, 2 replies; 23+ messages in thread
From: Wei Liu @ 2014-05-30 12:11 UTC (permalink / raw)
  To: Stefan Bader
  Cc: Zoltan Kiss, Wei Liu, Eric Dumazet, netdev, xen-devel,
	David Vrabel, Konrad Wilk, Boris Ostrovsky

On Fri, May 30, 2014 at 10:06:48AM +0200, Stefan Bader wrote:
[...]
> I had been idly wondering about this onwards. And trying to understand the whole
> skb handling environment, I tried to come up with some idea as well. It may be
> totally stupid and using the wrong assumptions. It seems to work in the sense
> that things did not blow up into my face immediately and somehow I did not see
> dropped packages due to the number of slots either.
> But again, I am not sure I am doing the right thing. The idea was to just try to
> get rid of so many compound pages (which I believe are the only ones that can
> have an offset big enough to allow some alignment savings)...
> 
> -Stefan
> 

Thanks. I think the general idea is OK, but it still involves
unnecessary page allocation. We don't actually need to get rid of
compound page by replacing it with a new page, we just need to make sure
the data inside is aligned.

If you look at xennet_make_frags, it only grants the 4K page which
contains data. I presume a simple memove would be better than alloc_page
+ memcpy. What do you think?

Like:
   memmove(page_address(fpage), page_address(fpage)+offset, size);
   frag->page_offset = 0;

Wei.

> 
> From 8571b106643b32296e58526e2fbe97c330877ac8 Mon Sep 17 00:00:00 2001
> From: Stefan Bader <stefan.bader@canonical.com>
> Date: Thu, 29 May 2014 12:18:01 +0200
> Subject: [PATCH] xen-netfront: Align frags to fit max slots
> 
> In cases where the frags in a skb require more than MAX_SKB_FRAGS + 1
> (= 18) 4K pages of grant pages, try to reduce the footprint by moving
> the data to new pages and have it aligned to the beginning.
> Then replace the page in the frag and release the old one. This sure is
> more expensive in compute but should happen not too often and sounds
> better than to just drop the packet in that case.
> 
> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
> ---
>  drivers/net/xen-netfront.c | 65 +++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 62 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 158b5e6..ad71e5c 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -544,6 +544,61 @@ static int xennet_count_skb_frag_slots(struct sk_buff *skb)
>  	return pages;
>  }
> 
> +/*
> + * Align data to new pages in order to save slots required to
> + * transmit this buffer.
> + * @skb - socket buffer
> + * @target - number of pages to save
> + * returns the number of pages the fragments have been reduced of
> + */
> +static int xennet_align_frags(struct sk_buff *skb, int target)
> +{
> +	int i, frags = skb_shinfo(skb)->nr_frags;
> +	int reduced = 0;
> +
> +	for (i = 0; i < frags; i++) {
> +		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
> +		struct page *fpage = skb_frag_page(frag);
> +		struct page *npage;
> +		unsigned long size;
> +		unsigned long offset;
> +		gfp_t gfp;
> +		int order;
> +
> +		if (!PageCompound(fpage))
> +			continue;
> +
> +		size = skb_frag_size(frag);
> +		offset = frag->page_offset & ~PAGE_MASK;
> +
> +		/*
> +		 * If the length of data in the last subpage of a compound
> +		 * page is smaller than the offset into the first data sub-
> +		 * page, we can save a subpage by copying data around.
> +		 */
> +		if ( ((offset + size) & ~PAGE_MASK) > offset )
> +			continue;
> +
> +		gfp = GFP_ATOMIC | __GFP_COLD;
> +		order = PFN_UP(size);
> +		if (order)
> +			gfp |= __GFP_COMP | __GFP_NOWARN;
> +
> +		npage = alloc_pages(gfp, order);
> +		if (!npage)
> +			break;
> +		memcpy(page_address(npage), skb_frag_address(frag), size);
> +		frag->page.p = npage;
> +		frag->page_offset = 0;
> +		put_page(fpage);
> +
> +		if (++reduced >= target)
> +			break;
> +	}
> +
> +	return reduced;
> +}
> +
>  static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
>  	unsigned short id;
> @@ -573,9 +628,13 @@ static int xennet_start_xmit(struct sk_buff *skb, struct
> net_device *dev)
>  	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>  		xennet_count_skb_frag_slots(skb);
>  	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> -		net_alert_ratelimited(
> -			"xennet: skb rides the rocket: %d slots\n", slots);
> -		goto drop;
> +		slots -= xennet_align_frags(skb, slots - (MAX_SKB_FRAGS + 1));
> +		if (slots > MAX_SKB_FRAGS + 1) {
> +			net_alert_ratelimited(
> +				"xennet: skb rides the rocket: %d slots\n",
> +				slots);
> +			goto drop;
> +		}
>  	}
> 
>  	spin_lock_irqsave(&np->tx_lock, flags);
> -- 
> 1.9.1
> 
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
  2014-05-30 12:11                 ` Wei Liu
@ 2014-05-30 12:28                   ` Stefan Bader
  2014-05-30 12:38                     ` Wei Liu
  2014-05-30 12:28                   ` David Laight
  1 sibling, 1 reply; 23+ messages in thread
From: Stefan Bader @ 2014-05-30 12:28 UTC (permalink / raw)
  To: Wei Liu
  Cc: Zoltan Kiss, Eric Dumazet, netdev, xen-devel, David Vrabel,
	Konrad Wilk, Boris Ostrovsky

[-- Attachment #1: Type: text/plain, Size: 5341 bytes --]

On 30.05.2014 14:11, Wei Liu wrote:
> On Fri, May 30, 2014 at 10:06:48AM +0200, Stefan Bader wrote:
> [...]
>> I had been idly wondering about this onwards. And trying to understand the whole
>> skb handling environment, I tried to come up with some idea as well. It may be
>> totally stupid and using the wrong assumptions. It seems to work in the sense
>> that things did not blow up into my face immediately and somehow I did not see
>> dropped packages due to the number of slots either.
>> But again, I am not sure I am doing the right thing. The idea was to just try to
>> get rid of so many compound pages (which I believe are the only ones that can
>> have an offset big enough to allow some alignment savings)...
>>
>> -Stefan
>>
> 
> Thanks. I think the general idea is OK, but it still involves
> unnecessary page allocation. We don't actually need to get rid of
> compound page by replacing it with a new page, we just need to make sure
> the data inside is aligned.
> 
> If you look at xennet_make_frags, it only grants the 4K page which
> contains data. I presume a simple memove would be better than alloc_page
> + memcpy. What do you think?
> 
> Like:
>    memmove(page_address(fpage), page_address(fpage)+offset, size);
>    frag->page_offset = 0;

I was hesitating to do that as I could not really tell whether I can make any
assumptions about those memory areas. So my cautious guess was to leave the
original pages alone altogether (maybe whoever owns those has something
important in the starting area). If I did it right, my new page potentially
could be a smaller allocation unit than the original, since I only ask for
something big enough to hold the frag size (ignoring a potential offset over
full 4K areas).

-Stefan
> 
> Wei.
> 
>>
>> From 8571b106643b32296e58526e2fbe97c330877ac8 Mon Sep 17 00:00:00 2001
>> From: Stefan Bader <stefan.bader@canonical.com>
>> Date: Thu, 29 May 2014 12:18:01 +0200
>> Subject: [PATCH] xen-netfront: Align frags to fit max slots
>>
>> In cases where the frags in a skb require more than MAX_SKB_FRAGS + 1
>> (= 18) 4K pages of grant pages, try to reduce the footprint by moving
>> the data to new pages and have it aligned to the beginning.
>> Then replace the page in the frag and release the old one. This sure is
>> more expensive in compute but should happen not too often and sounds
>> better than to just drop the packet in that case.
>>
>> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
>> ---
>>  drivers/net/xen-netfront.c | 65 +++++++++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 62 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> index 158b5e6..ad71e5c 100644
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -544,6 +544,61 @@ static int xennet_count_skb_frag_slots(struct sk_buff *skb)
>>  	return pages;
>>  }
>>
>> +/*
>> + * Align data to new pages in order to save slots required to
>> + * transmit this buffer.
>> + * @skb - socket buffer
>> + * @target - number of pages to save
>> + * returns the number of pages the fragments have been reduced of
>> + */
>> +static int xennet_align_frags(struct sk_buff *skb, int target)
>> +{
>> +	int i, frags = skb_shinfo(skb)->nr_frags;
>> +	int reduced = 0;
>> +
>> +	for (i = 0; i < frags; i++) {
>> +		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
>> +		struct page *fpage = skb_frag_page(frag);
>> +		struct page *npage;
>> +		unsigned long size;
>> +		unsigned long offset;
>> +		gfp_t gfp;
>> +		int order;
>> +
>> +		if (!PageCompound(fpage))
>> +			continue;
>> +
>> +		size = skb_frag_size(frag);
>> +		offset = frag->page_offset & ~PAGE_MASK;
>> +
>> +		/*
>> +		 * If the length of data in the last subpage of a compound
>> +		 * page is smaller than the offset into the first data sub-
>> +		 * page, we can save a subpage by copying data around.
>> +		 */
>> +		if ( ((offset + size) & ~PAGE_MASK) > offset )
>> +			continue;
>> +
>> +		gfp = GFP_ATOMIC | __GFP_COLD;
>> +		order = PFN_UP(size);
>> +		if (order)
>> +			gfp |= __GFP_COMP | __GFP_NOWARN;
>> +
>> +		npage = alloc_pages(gfp, order);
>> +		if (!npage)
>> +			break;
>> +		memcpy(page_address(npage), skb_frag_address(frag), size);
>> +		frag->page.p = npage;
>> +		frag->page_offset = 0;
>> +		put_page(fpage);
>> +
>> +		if (++reduced >= target)
>> +			break;
>> +	}
>> +
>> +	return reduced;
>> +}
>> +
>>  static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>>  {
>>  	unsigned short id;
>> @@ -573,9 +628,13 @@ static int xennet_start_xmit(struct sk_buff *skb, struct
>> net_device *dev)
>>  	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>>  		xennet_count_skb_frag_slots(skb);
>>  	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
>> -		net_alert_ratelimited(
>> -			"xennet: skb rides the rocket: %d slots\n", slots);
>> -		goto drop;
>> +		slots -= xennet_align_frags(skb, slots - (MAX_SKB_FRAGS + 1));
>> +		if (slots > MAX_SKB_FRAGS + 1) {
>> +			net_alert_ratelimited(
>> +				"xennet: skb rides the rocket: %d slots\n",
>> +				slots);
>> +			goto drop;
>> +		}
>>  	}
>>
>>  	spin_lock_irqsave(&np->tx_lock, flags);
>> -- 
>> 1.9.1
>>
>>
>>
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
  2014-05-30 12:11                 ` Wei Liu
  2014-05-30 12:28                   ` Stefan Bader
@ 2014-05-30 12:28                   ` David Laight
  2014-05-30 12:35                     ` Wei Liu
  1 sibling, 1 reply; 23+ messages in thread
From: David Laight @ 2014-05-30 12:28 UTC (permalink / raw)
  To: 'Wei Liu', Stefan Bader
  Cc: Zoltan Kiss, Eric Dumazet, netdev@vger.kernel.org,
	xen-devel@lists.xen.org, David Vrabel, Konrad Wilk,
	Boris Ostrovsky

From: Wei Liu
> On Fri, May 30, 2014 at 10:06:48AM +0200, Stefan Bader wrote:
> [...]
> > I had been idly wondering about this onwards. And trying to understand the whole
> > skb handling environment, I tried to come up with some idea as well. It may be
> > totally stupid and using the wrong assumptions. It seems to work in the sense
> > that things did not blow up into my face immediately and somehow I did not see
> > dropped packages due to the number of slots either.
> > But again, I am not sure I am doing the right thing. The idea was to just try to
> > get rid of so many compound pages (which I believe are the only ones that can
> > have an offset big enough to allow some alignment savings)...
> >
> > -Stefan
> >
> 
> Thanks. I think the general idea is OK, but it still involves
> unnecessary page allocation. We don't actually need to get rid of
> compound page by replacing it with a new page, we just need to make sure
> the data inside is aligned.
> 
> If you look at xennet_make_frags, it only grants the 4K page which
> contains data. I presume a simple memove would be better than alloc_page
> + memcpy. What do you think?
> 
> Like:
>    memmove(page_address(fpage), page_address(fpage)+offset, size);
>    frag->page_offset = 0;

Isn't the rest of the page likely to contain fragments of other ethernet
frames?  Even possibly of other data?

	David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
  2014-05-30 12:28                   ` David Laight
@ 2014-05-30 12:35                     ` Wei Liu
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Liu @ 2014-05-30 12:35 UTC (permalink / raw)
  To: David Laight
  Cc: 'Wei Liu', Stefan Bader, Zoltan Kiss, Eric Dumazet,
	netdev@vger.kernel.org, xen-devel@lists.xen.org, David Vrabel,
	Konrad Wilk, Boris Ostrovsky

On Fri, May 30, 2014 at 12:28:53PM +0000, David Laight wrote:
> From: Wei Liu
> > On Fri, May 30, 2014 at 10:06:48AM +0200, Stefan Bader wrote:
> > [...]
> > > I had been idly wondering about this onwards. And trying to understand the whole
> > > skb handling environment, I tried to come up with some idea as well. It may be
> > > totally stupid and using the wrong assumptions. It seems to work in the sense
> > > that things did not blow up into my face immediately and somehow I did not see
> > > dropped packages due to the number of slots either.
> > > But again, I am not sure I am doing the right thing. The idea was to just try to
> > > get rid of so many compound pages (which I believe are the only ones that can
> > > have an offset big enough to allow some alignment savings)...
> > >
> > > -Stefan
> > >
> > 
> > Thanks. I think the general idea is OK, but it still involves
> > unnecessary page allocation. We don't actually need to get rid of
> > compound page by replacing it with a new page, we just need to make sure
> > the data inside is aligned.
> > 
> > If you look at xennet_make_frags, it only grants the 4K page which
> > contains data. I presume a simple memove would be better than alloc_page
> > + memcpy. What do you think?
> > 
> > Like:
> >    memmove(page_address(fpage), page_address(fpage)+offset, size);
> >    frag->page_offset = 0;
> 
> Isn't the rest of the page likely to contain fragments of other ethernet
> frames?  Even possibly of other data?
> 

You're right, this is a valid concern.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
  2014-05-30 12:07                 ` Zoltan Kiss
@ 2014-05-30 12:37                   ` Stefan Bader
  2014-07-02 12:23                   ` Stefan Bader
  1 sibling, 0 replies; 23+ messages in thread
From: Stefan Bader @ 2014-05-30 12:37 UTC (permalink / raw)
  To: Zoltan Kiss, Wei Liu, Eric Dumazet
  Cc: netdev, xen-devel, David Vrabel, Konrad Wilk, Boris Ostrovsky

[-- Attachment #1: Type: text/plain, Size: 7904 bytes --]

On 30.05.2014 14:07, Zoltan Kiss wrote:
> On 30/05/14 09:06, Stefan Bader wrote:
>> On 16.05.2014 18:29, Zoltan Kiss wrote:
>>> On 16/05/14 16:34, Wei Liu wrote:
>>>> On Fri, May 16, 2014 at 08:22:19AM -0700, Eric Dumazet wrote:
>>>>> On Fri, 2014-05-16 at 15:36 +0100, Wei Liu wrote:
>>>>>> On Fri, May 16, 2014 at 07:21:08AM -0700, Eric Dumazet wrote:
>>>>>>> On Fri, 2014-05-16 at 14:11 +0100, Wei Liu wrote:
>>>>>>>
>>>>>>>> It's not that common to trigger this, I only saw a few reports. In fact
>>>>>>>> Stefan's report is the first one that comes with a method to reproduce
>>>>>>>> it.
>>>>>>>>
>>>>>>>> I tested with redis-benchmark on a guest with 256MB RAM and only saw a
>>>>>>>> few "failed to linearize", never saw a single one with 1GB guest.
>>>>>>> Well, I am just saying. This is asking order-5 allocations, and yes,
>>>>>>> this is going to fail after few days of uptime, no matter what you try.
>>>>>>>
>>>>>> Hmm... I see what you mean -- memory fragmentation leads to allocation
>>>>>> failure. Thanks.
>>>>> In the mean time, have you tried to lower gso_max_size ?
>>>>>
>>>>> Setting it witk netif_set_gso_max_size() to something like 56000 might
>>>>> avoid the problem.
>>>>>
>>>>> (Not sure if it is applicable in your case)
>>>>>
>>>> It works, at least in this Redis testcase. Could you explain a bit where
>>>> this 56000 magic number comes from? :-)
>>>>
>>>> Presumably I can derive it from some constant in core network code?
>>> I guess it just makes more unlikely to have packets with problematic layout.
>>> But the following packet would still fail:
>>> linear buffer : 80 bytes, on 2 pages
>>> 17 frags, 80 bytes each, each spanning over page boundary.
>>>
>>> I just had an idea: a modified version of xenvif_handle_frag_list function
>>> from netback would be useful for us here. It recreates the frags array on
>>> fully utilized 4k pages. Plus we can use pskb_expand_head to reduce the page
>>> number on the linear buffer (although it might not work, see my comment in
>>> the patch)
>>> The worst case linear buffer then spans N+1 pages, and has N*PAGE_SIZE+1
>>> bytes. Then the frags after this coalescing should have 16*PAGE_SIZE -
>>> (N*PAGE_SIZE+2) bytes at most, which is 16-N pages. Altogether that's 16+1
>>> page, which should definitely fit!
>>> This is what I mean:
>>>
>> I had been idly wondering about this onwards. And trying to understand the whole
>> skb handling environment, I tried to come up with some idea as well. It may be
>> totally stupid and using the wrong assumptions. It seems to work in the sense
>> that things did not blow up into my face immediately and somehow I did not see
>> dropped packages due to the number of slots either.
>> But again, I am not sure I am doing the right thing. The idea was to just try to
>> get rid of so many compound pages (which I believe are the only ones that can
>> have an offset big enough to allow some alignment savings)...
> Hi,
> 
> This probably helps in a lot of scenarios, but not for everything. E.g. if the
> skb has 15 frags, each PAGE_SIZE+1 bytes, it'll still consume 30 slots for the
> frags, and this function won't be able to help that.
> It's hard to come up with an algorithm which handles all the scenarios we can
> have here, and does that with the least possible amount of copy. I'll keep
> looking into this in the next weeks, but don't hesitate, if you have an idea!

Right, this can fail and then the packet still gets dropped. I only could test
against that regis-benchmark use case which seemed in general to generate about
4 or 5 frags but mostly compound pages then and with additional printk
statements the common case was just to have that 19 slots case (rare cases 20).
So the recovery usually needs to handle 1 or 2 pages reductions. This is one
reason I decided to give that function a target on how much it needs to do (also
to not try its luck on new memory allocation to heavily).

-Stefan
> 
> Regads,
> 
> Zoltan
>>
>> -Stefan
>>
>>
>>  From 8571b106643b32296e58526e2fbe97c330877ac8 Mon Sep 17 00:00:00 2001
>> From: Stefan Bader <stefan.bader@canonical.com>
>> Date: Thu, 29 May 2014 12:18:01 +0200
>> Subject: [PATCH] xen-netfront: Align frags to fit max slots
>>
>> In cases where the frags in a skb require more than MAX_SKB_FRAGS + 1
>> (= 18) 4K pages of grant pages, try to reduce the footprint by moving
>> the data to new pages and have it aligned to the beginning.
>> Then replace the page in the frag and release the old one. This sure is
>> more expensive in compute but should happen not too often and sounds
>> better than to just drop the packet in that case.
>>
>> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
>> ---
>>   drivers/net/xen-netfront.c | 65 +++++++++++++++++++++++++++++++++++++++++++---
>>   1 file changed, 62 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> index 158b5e6..ad71e5c 100644
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -544,6 +544,61 @@ static int xennet_count_skb_frag_slots(struct sk_buff *skb)
>>       return pages;
>>   }
>>
>> +/*
>> + * Align data to new pages in order to save slots required to
>> + * transmit this buffer.
>> + * @skb - socket buffer
>> + * @target - number of pages to save
>> + * returns the number of pages the fragments have been reduced of
>> + */
>> +static int xennet_align_frags(struct sk_buff *skb, int target)
>> +{
>> +    int i, frags = skb_shinfo(skb)->nr_frags;
>> +    int reduced = 0;
>> +
>> +    for (i = 0; i < frags; i++) {
>> +        skb_frag_t *frag = skb_shinfo(skb)->frags + i;
>> +        struct page *fpage = skb_frag_page(frag);
>> +        struct page *npage;
>> +        unsigned long size;
>> +        unsigned long offset;
>> +        gfp_t gfp;
>> +        int order;
>> +
>> +        if (!PageCompound(fpage))
>> +            continue;
>> +
>> +        size = skb_frag_size(frag);
>> +        offset = frag->page_offset & ~PAGE_MASK;
>> +
>> +        /*
>> +         * If the length of data in the last subpage of a compound
>> +         * page is smaller than the offset into the first data sub-
>> +         * page, we can save a subpage by copying data around.
>> +         */
>> +        if ( ((offset + size) & ~PAGE_MASK) > offset )
>> +            continue;
>> +
>> +        gfp = GFP_ATOMIC | __GFP_COLD;
>> +        order = PFN_UP(size);
>> +        if (order)
>> +            gfp |= __GFP_COMP | __GFP_NOWARN;
>> +
>> +        npage = alloc_pages(gfp, order);
>> +        if (!npage)
>> +            break;
>> +        memcpy(page_address(npage), skb_frag_address(frag), size);
>> +        frag->page.p = npage;
>> +        frag->page_offset = 0;
>> +        put_page(fpage);
>> +
>> +        if (++reduced >= target)
>> +            break;
>> +    }
>> +
>> +    return reduced;
>> +}
>> +
>>   static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>>   {
>>       unsigned short id;
>> @@ -573,9 +628,13 @@ static int xennet_start_xmit(struct sk_buff *skb, struct
>> net_device *dev)
>>       slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>>           xennet_count_skb_frag_slots(skb);
>>       if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
>> -        net_alert_ratelimited(
>> -            "xennet: skb rides the rocket: %d slots\n", slots);
>> -        goto drop;
>> +        slots -= xennet_align_frags(skb, slots - (MAX_SKB_FRAGS + 1));
>> +        if (slots > MAX_SKB_FRAGS + 1) {
>> +            net_alert_ratelimited(
>> +                "xennet: skb rides the rocket: %d slots\n",
>> +                slots);
>> +            goto drop;
>> +        }
>>       }
>>
>>       spin_lock_irqsave(&np->tx_lock, flags);
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
  2014-05-30 12:28                   ` Stefan Bader
@ 2014-05-30 12:38                     ` Wei Liu
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Liu @ 2014-05-30 12:38 UTC (permalink / raw)
  To: Stefan Bader
  Cc: Wei Liu, Zoltan Kiss, Eric Dumazet, netdev, xen-devel,
	David Vrabel, Konrad Wilk, Boris Ostrovsky

On Fri, May 30, 2014 at 02:28:17PM +0200, Stefan Bader wrote:
> On 30.05.2014 14:11, Wei Liu wrote:
> > On Fri, May 30, 2014 at 10:06:48AM +0200, Stefan Bader wrote:
> > [...]
> >> I had been idly wondering about this onwards. And trying to understand the whole
> >> skb handling environment, I tried to come up with some idea as well. It may be
> >> totally stupid and using the wrong assumptions. It seems to work in the sense
> >> that things did not blow up into my face immediately and somehow I did not see
> >> dropped packages due to the number of slots either.
> >> But again, I am not sure I am doing the right thing. The idea was to just try to
> >> get rid of so many compound pages (which I believe are the only ones that can
> >> have an offset big enough to allow some alignment savings)...
> >>
> >> -Stefan
> >>
> > 
> > Thanks. I think the general idea is OK, but it still involves
> > unnecessary page allocation. We don't actually need to get rid of
> > compound page by replacing it with a new page, we just need to make sure
> > the data inside is aligned.
> > 
> > If you look at xennet_make_frags, it only grants the 4K page which
> > contains data. I presume a simple memove would be better than alloc_page
> > + memcpy. What do you think?
> > 
> > Like:
> >    memmove(page_address(fpage), page_address(fpage)+offset, size);
> >    frag->page_offset = 0;
> 
> I was hesitating to do that as I could not really tell whether I can make any
> assumptions about those memory areas. So my cautious guess was to leave the
> original pages alone altogether (maybe whoever owns those has something
> important in the starting area). If I did it right, my new page potentially
> could be a smaller allocation unit than the original, since I only ask for
> something big enough to hold the frag size (ignoring a potential offset over
> full 4K areas).
> 

Thanks for the explanation.

Wei.

> -Stefan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
  2014-05-30 12:07                 ` Zoltan Kiss
  2014-05-30 12:37                   ` Stefan Bader
@ 2014-07-02 12:23                   ` Stefan Bader
  2014-07-02 13:12                     ` Zoltan Kiss
  1 sibling, 1 reply; 23+ messages in thread
From: Stefan Bader @ 2014-07-02 12:23 UTC (permalink / raw)
  To: Zoltan Kiss, Wei Liu, Eric Dumazet
  Cc: netdev, xen-devel, David Vrabel, Konrad Wilk, Boris Ostrovsky

[-- Attachment #1: Type: text/plain, Size: 7648 bytes --]

On 30.05.2014 14:07, Zoltan Kiss wrote:
> On 30/05/14 09:06, Stefan Bader wrote:
>> On 16.05.2014 18:29, Zoltan Kiss wrote:
>>> On 16/05/14 16:34, Wei Liu wrote:
>>>> On Fri, May 16, 2014 at 08:22:19AM -0700, Eric Dumazet wrote:
>>>>> On Fri, 2014-05-16 at 15:36 +0100, Wei Liu wrote:
>>>>>> On Fri, May 16, 2014 at 07:21:08AM -0700, Eric Dumazet wrote:
>>>>>>> On Fri, 2014-05-16 at 14:11 +0100, Wei Liu wrote:
>>>>>>>
>>>>>>>> It's not that common to trigger this, I only saw a few reports. In fact
>>>>>>>> Stefan's report is the first one that comes with a method to reproduce
>>>>>>>> it.
>>>>>>>>
>>>>>>>> I tested with redis-benchmark on a guest with 256MB RAM and only saw a
>>>>>>>> few "failed to linearize", never saw a single one with 1GB guest.
>>>>>>> Well, I am just saying. This is asking order-5 allocations, and yes,
>>>>>>> this is going to fail after few days of uptime, no matter what you try.
>>>>>>>
>>>>>> Hmm... I see what you mean -- memory fragmentation leads to allocation
>>>>>> failure. Thanks.
>>>>> In the mean time, have you tried to lower gso_max_size ?
>>>>>
>>>>> Setting it witk netif_set_gso_max_size() to something like 56000 might
>>>>> avoid the problem.
>>>>>
>>>>> (Not sure if it is applicable in your case)
>>>>>
>>>> It works, at least in this Redis testcase. Could you explain a bit where
>>>> this 56000 magic number comes from? :-)
>>>>
>>>> Presumably I can derive it from some constant in core network code?
>>> I guess it just makes more unlikely to have packets with problematic layout.
>>> But the following packet would still fail:
>>> linear buffer : 80 bytes, on 2 pages
>>> 17 frags, 80 bytes each, each spanning over page boundary.
>>>
>>> I just had an idea: a modified version of xenvif_handle_frag_list function
>>> from netback would be useful for us here. It recreates the frags array on
>>> fully utilized 4k pages. Plus we can use pskb_expand_head to reduce the page
>>> number on the linear buffer (although it might not work, see my comment in
>>> the patch)
>>> The worst case linear buffer then spans N+1 pages, and has N*PAGE_SIZE+1
>>> bytes. Then the frags after this coalescing should have 16*PAGE_SIZE -
>>> (N*PAGE_SIZE+2) bytes at most, which is 16-N pages. Altogether that's 16+1
>>> page, which should definitely fit!
>>> This is what I mean:
>>>
>> I had been idly wondering about this onwards. And trying to understand the whole
>> skb handling environment, I tried to come up with some idea as well. It may be
>> totally stupid and using the wrong assumptions. It seems to work in the sense
>> that things did not blow up into my face immediately and somehow I did not see
>> dropped packages due to the number of slots either.
>> But again, I am not sure I am doing the right thing. The idea was to just try to
>> get rid of so many compound pages (which I believe are the only ones that can
>> have an offset big enough to allow some alignment savings)...
> Hi,
> 
> This probably helps in a lot of scenarios, but not for everything. E.g. if the
> skb has 15 frags, each PAGE_SIZE+1 bytes, it'll still consume 30 slots for the
> frags, and this function won't be able to help that.
> It's hard to come up with an algorithm which handles all the scenarios we can
> have here, and does that with the least possible amount of copy. I'll keep
> looking into this in the next weeks, but don't hesitate, if you have an idea!

Hi Zoltan,

I lost a bit track about this. I think I saw some summary about brainstorming
something similar but for the backend from you (not sure). I think during last
Xen Hackathon. I cannot remember that got to some solution on the netfront side.
Do I miss something?

-Stefan
> 
> Regads,
> 
> Zoltan
>>
>> -Stefan
>>
>>
>>  From 8571b106643b32296e58526e2fbe97c330877ac8 Mon Sep 17 00:00:00 2001
>> From: Stefan Bader <stefan.bader@canonical.com>
>> Date: Thu, 29 May 2014 12:18:01 +0200
>> Subject: [PATCH] xen-netfront: Align frags to fit max slots
>>
>> In cases where the frags in a skb require more than MAX_SKB_FRAGS + 1
>> (= 18) 4K pages of grant pages, try to reduce the footprint by moving
>> the data to new pages and have it aligned to the beginning.
>> Then replace the page in the frag and release the old one. This sure is
>> more expensive in compute but should happen not too often and sounds
>> better than to just drop the packet in that case.
>>
>> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
>> ---
>>   drivers/net/xen-netfront.c | 65 +++++++++++++++++++++++++++++++++++++++++++---
>>   1 file changed, 62 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> index 158b5e6..ad71e5c 100644
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -544,6 +544,61 @@ static int xennet_count_skb_frag_slots(struct sk_buff *skb)
>>       return pages;
>>   }
>>
>> +/*
>> + * Align data to new pages in order to save slots required to
>> + * transmit this buffer.
>> + * @skb - socket buffer
>> + * @target - number of pages to save
>> + * returns the number of pages the fragments have been reduced of
>> + */
>> +static int xennet_align_frags(struct sk_buff *skb, int target)
>> +{
>> +    int i, frags = skb_shinfo(skb)->nr_frags;
>> +    int reduced = 0;
>> +
>> +    for (i = 0; i < frags; i++) {
>> +        skb_frag_t *frag = skb_shinfo(skb)->frags + i;
>> +        struct page *fpage = skb_frag_page(frag);
>> +        struct page *npage;
>> +        unsigned long size;
>> +        unsigned long offset;
>> +        gfp_t gfp;
>> +        int order;
>> +
>> +        if (!PageCompound(fpage))
>> +            continue;
>> +
>> +        size = skb_frag_size(frag);
>> +        offset = frag->page_offset & ~PAGE_MASK;
>> +
>> +        /*
>> +         * If the length of data in the last subpage of a compound
>> +         * page is smaller than the offset into the first data sub-
>> +         * page, we can save a subpage by copying data around.
>> +         */
>> +        if ( ((offset + size) & ~PAGE_MASK) > offset )
>> +            continue;
>> +
>> +        gfp = GFP_ATOMIC | __GFP_COLD;
>> +        order = PFN_UP(size);
>> +        if (order)
>> +            gfp |= __GFP_COMP | __GFP_NOWARN;
>> +
>> +        npage = alloc_pages(gfp, order);
>> +        if (!npage)
>> +            break;
>> +        memcpy(page_address(npage), skb_frag_address(frag), size);
>> +        frag->page.p = npage;
>> +        frag->page_offset = 0;
>> +        put_page(fpage);
>> +
>> +        if (++reduced >= target)
>> +            break;
>> +    }
>> +
>> +    return reduced;
>> +}
>> +
>>   static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>>   {
>>       unsigned short id;
>> @@ -573,9 +628,13 @@ static int xennet_start_xmit(struct sk_buff *skb, struct
>> net_device *dev)
>>       slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>>           xennet_count_skb_frag_slots(skb);
>>       if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
>> -        net_alert_ratelimited(
>> -            "xennet: skb rides the rocket: %d slots\n", slots);
>> -        goto drop;
>> +        slots -= xennet_align_frags(skb, slots - (MAX_SKB_FRAGS + 1));
>> +        if (slots > MAX_SKB_FRAGS + 1) {
>> +            net_alert_ratelimited(
>> +                "xennet: skb rides the rocket: %d slots\n",
>> +                slots);
>> +            goto drop;
>> +        }
>>       }
>>
>>       spin_lock_irqsave(&np->tx_lock, flags);
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
  2014-07-02 12:23                   ` Stefan Bader
@ 2014-07-02 13:12                     ` Zoltan Kiss
  0 siblings, 0 replies; 23+ messages in thread
From: Zoltan Kiss @ 2014-07-02 13:12 UTC (permalink / raw)
  To: Stefan Bader, Wei Liu, Eric Dumazet
  Cc: netdev, xen-devel, David Vrabel, Konrad Wilk, Boris Ostrovsky

Hi,

Yes, it lost traction recently a little bit. The latest idea is to use 
the skb segmenting code in netfront to slice up the skb until it becomes 
a set of skbs which can fit through the ring. But to test that out I 
need to make pktgen work with TCP, that's where this got stopped.

Zoli


On 02/07/14 13:23, Stefan Bader wrote:
> On 30.05.2014 14:07, Zoltan Kiss wrote:
>> On 30/05/14 09:06, Stefan Bader wrote:
>>> On 16.05.2014 18:29, Zoltan Kiss wrote:
>>>> On 16/05/14 16:34, Wei Liu wrote:
>>>>> On Fri, May 16, 2014 at 08:22:19AM -0700, Eric Dumazet wrote:
>>>>>> On Fri, 2014-05-16 at 15:36 +0100, Wei Liu wrote:
>>>>>>> On Fri, May 16, 2014 at 07:21:08AM -0700, Eric Dumazet wrote:
>>>>>>>> On Fri, 2014-05-16 at 14:11 +0100, Wei Liu wrote:
>
> Hi Zoltan,
>
> I lost a bit track about this. I think I saw some summary about brainstorming
> something similar but for the backend from you (not sure). I think during last
> Xen Hackathon. I cannot remember that got to some solution on the netfront side.
> Do I miss something?
>
> -Stefan
>>
>> Regads,
>>
>> Zoltan

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-07-02 13:12 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-16 11:08 [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots Wei Liu
2014-05-16 13:04 ` Eric Dumazet
2014-05-16 13:11   ` Wei Liu
2014-05-16 14:21     ` Eric Dumazet
2014-05-16 14:36       ` Wei Liu
2014-05-16 15:22         ` Eric Dumazet
2014-05-16 15:34           ` Wei Liu
2014-05-16 16:29             ` Zoltan Kiss
2014-05-16 16:47               ` Eric Dumazet
2014-05-16 16:51                 ` Zoltan Kiss
2014-05-16 17:00                   ` Eric Dumazet
2014-05-16 16:54               ` Wei Liu
2014-05-19 16:47                 ` Zoltan Kiss
2014-05-30  8:06               ` Stefan Bader
2014-05-30 12:07                 ` Zoltan Kiss
2014-05-30 12:37                   ` Stefan Bader
2014-07-02 12:23                   ` Stefan Bader
2014-07-02 13:12                     ` Zoltan Kiss
2014-05-30 12:11                 ` Wei Liu
2014-05-30 12:28                   ` Stefan Bader
2014-05-30 12:38                     ` Wei Liu
2014-05-30 12:28                   ` David Laight
2014-05-30 12:35                     ` Wei Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).