From: jerry <jerry.lilijun@huawei.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
Wei Liu <wei.liu2@citrix.com>,
stefano.stabellini@eu.citrix.com
Subject: Re: netback BUG_ON when using copy_skb=1
Date: Thu, 17 Oct 2013 15:41:51 +0800 [thread overview]
Message-ID: <525F94BF.6050500@huawei.com> (raw)
In-Reply-To: <525E903602000078000FB6DF@nat28.tlf.novell.com>
Hi Jan,
Thanks for your reply.
Yes, I am using the SLE11 kernel 3.0.58 which is not up-to-date as you assumed.
I find one related patch named xen-netback-generalize which was committed on Aug 7 and has been applied to SLE11 kernel 3.0.98.
That BUG_ON(netbk->mmap_pages[idx] != page) has been removed in this patch.
But there may be still concurrency problems in my test.
If the page replacing in copy_pending_req() was done after netif_get_page_ext() in netbk_gop_frag(), copy_gop->flags is wrongly marked with GNTCOPY_source_gref.
Here the memory of that page in skb has been replaced with Dom0 local memory, so the later HYPERVISOR_multicall() with GNTTABOP_copy in netbk_rx_actions() will get errors.
The messages is shown as:
(XEN) grant_table.c:305:d0 Bad flags (0) or dom (0). (expected dom 0)
Would you like to share some opinions?
Regards,
Jerry
On 2013/10/16 19:10, Jan Beulich wrote:
>>>> On 16.10.13 at 06:13, jerry <jerry.lilijun@huawei.com> wrote:
>> Hi Wei Liu,
>>
>> I am doing some network performance on Xen4.1.2 and kernel 3.0, and get a
>> crash with BUG_ON(netbk->mmap_pages[idx] != page) in netbk_gop_frag()
>> accidentally.
>>
>> By analyzing the module drivers/xen/netback,
>
> You aren't looking at the upstream driver, are you? If so, Wei is
> very likely the wrong addressee.
>
> Assuming that you instead talk of the SLE11 kernel, I can only
> point out that a problem in that code was found and fixed a
> couple of months ago (resulting in the BUG_ON() you quoted not
> being there anymore), so you're simply not looking at up-to-date
> code.
>
> Jan
>
>> I think the reason is as
>> follows when sending packets from VM1 to VM2:
>> 1) The two netback thread(the first for VM1 sending, second for VM2
>> receiving) run concurrently.
>> 2) In first netback thread, it will do delayed copy from a foreign granted
>> page to local memory when some outstanding packets have been pending too
>> long( above half of one HZ).
>> Then netbk->mmap_pages[idx] will be replaced with new allocated page.
>> 3) If the packets are forwarded to VM2 by virtual switch, netbk_gop_frag()
>> will be called in second netback thread.
>> And that function will judge whether the pages in skb frags[] is foreign
>> in order to make sure how to do grant copy.
>> 4) If the page replacing was done after the page foreign judge in
>> netbk_gop_frag(), the BUG will be invoked because the page from skb frags[]
>> are different with mmap_pages[idx].
>>
>> I tried to using spin_lock to protect the page accessing, but no appropriate
>> solutions was found.
>> How to fix this problem? Would you like to share some opinions?
>>
>> In addition, I have tried to turn off copy_skb. Then the vif netdevice may
>> not be released after shutting down VM,
>> that's because outstanding packets hold the reference count of the device
>> too long for some unknown reason.
>> The reason may be that the NIC does not release packets after DMA.
>> Does anyone have met such problems? Thanks.
>>
>> Best regards,
>> Jerry
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>
>
>
>
> .
>
next prev parent reply other threads:[~2013-10-17 7:43 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-16 4:13 netback BUG_ON when using copy_skb=1 jerry
2013-10-16 11:10 ` Jan Beulich
2013-10-17 7:41 ` jerry [this message]
2013-10-17 8:00 ` Jan Beulich
2013-10-17 10:26 ` jerry
2013-10-17 12:11 ` Jan Beulich
2013-10-22 1:18 ` jerry
2013-10-22 7:11 ` Jan Beulich
2013-10-26 8:32 ` jerry
2013-10-28 7:43 ` Jan Beulich
2013-10-29 4:04 ` jerry
2013-10-28 11:43 ` Wei Liu
2013-10-31 15:17 ` Ian Campbell
2013-10-31 15:32 ` Wei Liu
2013-11-01 2:53 ` jerry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=525F94BF.6050500@huawei.com \
--to=jerry.lilijun@huawei.com \
--cc=JBeulich@suse.com \
--cc=stefano.stabellini@eu.citrix.com \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).