From: Zoltan Kiss <zoltan.kiss@citrix.com>
To: Wei Liu <wei.liu2@citrix.com>, Eric Dumazet <eric.dumazet@gmail.com>
Cc: <netdev@vger.kernel.org>, <xen-devel@lists.xen.org>,
David Vrabel <david.vrabel@citrix.com>,
Konrad Wilk <konrad.wilk@oracle.com>,
Boris Ostrovsky <boris.ostrovsky@oracle.com>,
Stefan Bader <stefan.bader@canonical.com>
Subject: Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
Date: Fri, 16 May 2014 17:29:05 +0100 [thread overview]
Message-ID: <53763CD1.6060500@citrix.com> (raw)
In-Reply-To: <20140516153452.GM18551@zion.uk.xensource.com>
On 16/05/14 16:34, Wei Liu wrote:
> On Fri, May 16, 2014 at 08:22:19AM -0700, Eric Dumazet wrote:
>> On Fri, 2014-05-16 at 15:36 +0100, Wei Liu wrote:
>>> On Fri, May 16, 2014 at 07:21:08AM -0700, Eric Dumazet wrote:
>>>> On Fri, 2014-05-16 at 14:11 +0100, Wei Liu wrote:
>>>>
>>>>> It's not that common to trigger this, I only saw a few reports. In fact
>>>>> Stefan's report is the first one that comes with a method to reproduce
>>>>> it.
>>>>>
>>>>> I tested with redis-benchmark on a guest with 256MB RAM and only saw a
>>>>> few "failed to linearize", never saw a single one with 1GB guest.
>>>>
>>>> Well, I am just saying. This is asking order-5 allocations, and yes,
>>>> this is going to fail after few days of uptime, no matter what you try.
>>>>
>>>
>>> Hmm... I see what you mean -- memory fragmentation leads to allocation
>>> failure. Thanks.
>>
>> In the mean time, have you tried to lower gso_max_size ?
>>
>> Setting it witk netif_set_gso_max_size() to something like 56000 might
>> avoid the problem.
>>
>> (Not sure if it is applicable in your case)
>>
>
> It works, at least in this Redis testcase. Could you explain a bit where
> this 56000 magic number comes from? :-)
>
> Presumably I can derive it from some constant in core network code?
I guess it just makes more unlikely to have packets with problematic layout. But the following packet would still fail:
linear buffer : 80 bytes, on 2 pages
17 frags, 80 bytes each, each spanning over page boundary.
I just had an idea: a modified version of xenvif_handle_frag_list function from netback would be useful for us here. It recreates the frags array on fully utilized 4k pages. Plus we can use pskb_expand_head to reduce the page number on the linear buffer (although it might not work, see my comment in the patch)
The worst case linear buffer then spans N+1 pages, and has N*PAGE_SIZE+1 bytes. Then the frags after this coalescing should have 16*PAGE_SIZE - (N*PAGE_SIZE+2) bytes at most, which is 16-N pages. Altogether that's 16+1 page, which should definitely fit!
This is what I mean:
8<--------------
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 158b5e6..b1133d6 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -544,6 +544,73 @@ static int xennet_count_skb_frag_slots(struct sk_buff *skb)
return pages;
}
+int xenvif_reduce_pages(struct sk_buff *skb, int target)
+{
+ unsigned int offset = skb_headlen(skb);
+ skb_frag_t frags[MAX_SKB_FRAGS];
+ int newfrags, oldfrags;
+ unsigned int pages, optimal;
+
+ BUG_ON(!target);
+
+ pages = DIV_ROUND_UP(offset_in_page(skb->data) + skb_headlen(skb), PAGE_SIZE);
+ optimal = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);
+ if (pages - optimal) {
+ int err;
+/* FIXME: we should check if pskb_expand_head really allocates on page boundary,
+ * otherwise we can still have suboptimal page layout */
+ if (unlikely(err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC)))
+ return err;
+ target -= pages - optimal;
+ if (!target)
+ return 0;
+ }
+
+ /* Subtract frags size, we will correct it later */
+ skb->truesize -= skb->data_len;
+
+ /* Create a brand new frags array and coalesce there */
+ for (newfrags = 0; offset < skb->len; newfrags++) {
+ struct page *page;
+ unsigned int len;
+
+ BUG_ON(newfrags >= MAX_SKB_FRAGS);
+ page = alloc_page(GFP_ATOMIC);
+ if (!page) {
+ int j;
+ skb->truesize += skb->data_len;
+ for (j = 0; j < newfrags; j++)
+ put_page(frags[j].page.p);
+ return -ENOMEM;
+ }
+
+ if (offset + PAGE_SIZE < skb->len)
+ len = PAGE_SIZE;
+ else
+ len = skb->len - offset;
+ if (skb_copy_bits(skb, offset, page_address(page), len))
+ BUG();
+
+ offset += len;
+ frags[newfrags].page.p = page;
+ frags[newfrags].page_offset = 0;
+ skb_frag_size_set(&frags[newfrags], len);
+ }
+
+ /* Drop the original buffers */
+ for (oldfrags = 0; oldfrags < skb_shinfo(skb)->nr_frags; oldfrags++)
+ skb_frag_unref(skb, oldfrags);
+
+ /* Swap the new frags array with the old one */
+ memcpy(skb_shinfo(skb)->frags,
+ frags,
+ newfrags * sizeof(skb_frag_t));
+ skb_shinfo(skb)->nr_frags = newfrags;
+ /* Correct truesize */
+ skb->truesize += newfrags * PAGE_SIZE;
+ return 0;
+}
+
static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
{
unsigned short id;
@@ -573,11 +640,21 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
xennet_count_skb_frag_slots(skb);
if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
- net_alert_ratelimited(
- "xennet: skb rides the rocket: %d slots\n", slots);
- goto drop;
+ if (unlikely(xenvif_reduce_pages(skb, slots - MAX_SKB_FRAGS - 1))) {
+ net_alert_ratelimited(
+ "xennet: couldn't reduce slot number from %d\n", slots);
+ goto drop;
+ }
+ slots = DIV_ROUND_UP(offset_in_page(data) + len, PAGE_SIZE) +
+ xennet_count_skb_frag_slots(skb);
+ if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
+ net_alert_ratelimited(
+ "xennet: slot reduction doesn't work, slots: %d\n", slots);
+ goto drop;
+ }
}
+
spin_lock_irqsave(&np->tx_lock, flags);
if (unlikely(!netif_carrier_ok(dev) ||
next prev parent reply other threads:[~2014-05-16 16:29 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-16 11:08 [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots Wei Liu
2014-05-16 13:04 ` Eric Dumazet
2014-05-16 13:04 ` Eric Dumazet
2014-05-16 13:11 ` Wei Liu
2014-05-16 13:11 ` Wei Liu
2014-05-16 14:21 ` Eric Dumazet
2014-05-16 14:36 ` Wei Liu
2014-05-16 15:22 ` Eric Dumazet
2014-05-16 15:34 ` Wei Liu
2014-05-16 16:29 ` Zoltan Kiss [this message]
2014-05-16 16:47 ` Eric Dumazet
2014-05-16 16:47 ` Eric Dumazet
2014-05-16 16:51 ` Zoltan Kiss
2014-05-16 16:51 ` Zoltan Kiss
2014-05-16 17:00 ` Eric Dumazet
2014-05-16 17:00 ` Eric Dumazet
2014-05-16 16:54 ` Wei Liu
2014-05-16 16:54 ` Wei Liu
2014-05-19 16:47 ` Zoltan Kiss
2014-05-19 16:47 ` Zoltan Kiss
2014-05-30 8:06 ` Stefan Bader
2014-05-30 8:06 ` Stefan Bader
2014-05-30 12:07 ` Zoltan Kiss
2014-05-30 12:07 ` Zoltan Kiss
2014-05-30 12:37 ` Stefan Bader
2014-05-30 12:37 ` Stefan Bader
2014-07-02 12:23 ` Stefan Bader
2014-07-02 13:12 ` Zoltan Kiss
2014-07-02 13:12 ` Zoltan Kiss
2014-07-02 12:23 ` Stefan Bader
2014-05-30 12:11 ` Wei Liu
2014-05-30 12:11 ` Wei Liu
2014-05-30 12:28 ` Stefan Bader
2014-05-30 12:28 ` Stefan Bader
2014-05-30 12:38 ` Wei Liu
2014-05-30 12:38 ` Wei Liu
2014-05-30 12:28 ` David Laight
2014-05-30 12:28 ` David Laight
2014-05-30 12:35 ` Wei Liu
2014-05-30 12:35 ` Wei Liu
2014-05-16 16:29 ` Zoltan Kiss
2014-05-16 15:34 ` Wei Liu
2014-05-16 15:22 ` Eric Dumazet
2014-05-16 14:36 ` Wei Liu
2014-05-16 14:21 ` Eric Dumazet
-- strict thread matches above, loose matches on Subject: below --
2014-05-16 11:08 Wei Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53763CD1.6060500@citrix.com \
--to=zoltan.kiss@citrix.com \
--cc=boris.ostrovsky@oracle.com \
--cc=david.vrabel@citrix.com \
--cc=eric.dumazet@gmail.com \
--cc=konrad.wilk@oracle.com \
--cc=netdev@vger.kernel.org \
--cc=stefan.bader@canonical.com \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.