From: Adnan Misherfi <adnan.misherfi@oracle.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>,
Ben Hutchings <bhutchings@solarflare.com>,
"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"davem@davemloft.net" <davem@davemloft.net>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] xen/netback: calculate correctly the SKB slots.
Date: Tue, 22 May 2012 15:24:02 -0400 [thread overview]
Message-ID: <4FBBE7D2.9040105@oracle.com> (raw)
In-Reply-To: <20120522180901.GC22488@phenom.dumpdata.com>
Konrad Rzeszutek Wilk wrote:
>>>> wrong, which caused the RX ring to be erroneously declared full,
>>>> and the receive queue to be stopped. The problem shows up when two
>>>> guest running on the same server tries to communicates using large
>>>>
> .. snip..
>
>>> The function name is xen_netbk_count_skb_slots() in net-next. This
>>> appears to depend on the series in
>>> <http://lists.xen.org/archives/html/xen-devel/2012-01/msg00982.html>.
>>>
>> Yes, I don't think that patchset was intended for prime time just yet.
>> Can this issue be reproduced without it?
>>
>
> It was based on 3.4, but the bug and work to fix this was done on top of
> a 3.4 version of netback backported in a 3.0 kernel. Let me double check
> whether there were some missing patches.
>
>
>>>> int i, copy_off;
>>>>
>>>> count = DIV_ROUND_UP(
>>>> - offset_in_page(skb->data)+skb_headlen(skb), PAGE_SIZE);
>>>> + offset_in_page(skb->data + skb_headlen(skb)), PAGE_SIZE);
>>>>
>>> The new version would be equivalent to:
>>> count = offset_in_page(skb->data + skb_headlen(skb)) != 0;
>>> which is not right, as netbk_gop_skb() will use one slot per page.
>>>
>> Just outside the context of this patch we separately count the frag
>> pages.
>>
>> However I think you are right if skb->data covers > 1 page, since the
>> new version can only ever return 0 or 1. I expect this patch papers over
>> the underlying issue by not stopping often enough, rather than actually
>> fixing the underlying issue.
>>
>
> Ah, any thoughts? Have you guys seen this behavior as well?
>
>>> The real problem is likely that you're not using the same condition to
>>> stop and wake the queue.
>>>
>> Agreed, it would be useful to see the argument for this patch presented
>> in that light. In particular the relationship between
>> xenvif_rx_schedulable() (used to wake queue) and
>> xen_netbk_must_stop_queue() (used to stop queue).
>>
>
> Do you have any debug patches to ... do open-heart surgery on the
> rings of netback as its hitting the issues Adnan has found?
>
>
>> As it stands the description describes a setup which can repro the
>> problem but doesn't really analyse what actually happens, nor justify
>> the correctness of the fix.
>>
>
> Hm, Adnan - you dug in to this and you got tons of notes. Could you
> describe what you saw that caused this?
>
The problem is that the function xen_netbk_count_skb_slots() returns two
different counts for same type packets of same size (ICMP,3991). At the
start of the test
the count is one, later on the count changes to two, soon after the
counts becomes two, the condition ring full becomes true, and queue get
stopped, and never gets
started again.There are few point to make here:
1- It takes less that 128 ping packets to reproduce this
2- What is interesting here is that it works correct for many packet
sizes including 1500,400,500 9000, (3990, but not 3991)
3- The inconsistent count for the same packet size and type
4- I do not believe the ring was actually full when it was declared
full, I think the consumer pointer was wrong. (vif->rx_req_cons_peek in
function xenvif_start_xmit())
5- After changing the code the count returned from
xen_netbk_count_skb_slots() was always consistent, and worked just fine,
I let it runs for at least 12 hours.
next prev parent reply other threads:[~2012-05-22 19:24 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-21 17:36 [PATCH] xen/netback: calculate correctly the SKB slots Konrad Rzeszutek Wilk
2012-05-21 19:14 ` Ben Hutchings
2012-05-22 9:21 ` Ian Campbell
2012-05-22 18:09 ` Konrad Rzeszutek Wilk
2012-05-22 19:01 ` Simon Graham
2012-05-22 19:28 ` Ian Campbell
2012-05-22 20:03 ` Simon Graham
2012-05-23 13:12 ` Konrad Rzeszutek Wilk
2012-05-23 14:17 ` Simon Graham
2012-05-22 19:24 ` Adnan Misherfi [this message]
2012-05-24 11:12 ` Ian Campbell
2012-05-22 18:01 ` Konrad Rzeszutek Wilk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FBBE7D2.9040105@oracle.com \
--to=adnan.misherfi@oracle.com \
--cc=Ian.Campbell@citrix.com \
--cc=bhutchings@solarflare.com \
--cc=davem@davemloft.net \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).