All of lore.kernel.org
 help / color / mirror / Atom feed
From: jerry <jerry.lilijun@huawei.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
	Wei Liu <wei.liu2@citrix.com>,
	stefano.stabellini@eu.citrix.com
Subject: Re: netback BUG_ON when using copy_skb=1
Date: Thu, 17 Oct 2013 15:41:51 +0800	[thread overview]
Message-ID: <525F94BF.6050500@huawei.com> (raw)
In-Reply-To: <525E903602000078000FB6DF@nat28.tlf.novell.com>

Hi Jan,

Thanks for your reply.
Yes, I am using the SLE11 kernel 3.0.58 which is not up-to-date as you assumed.
I find one related patch named xen-netback-generalize which was committed on Aug 7 and has been applied to SLE11 kernel 3.0.98.
That BUG_ON(netbk->mmap_pages[idx] != page) has been removed in this patch.

But there may be still concurrency problems in my test.
If the page replacing in copy_pending_req() was done after netif_get_page_ext() in netbk_gop_frag(), copy_gop->flags is wrongly marked with GNTCOPY_source_gref.
Here the memory of that page in skb has been replaced with Dom0 local memory, so the later HYPERVISOR_multicall() with GNTTABOP_copy in netbk_rx_actions() will get errors.
The messages is shown as:

(XEN) grant_table.c:305:d0 Bad flags (0) or dom (0). (expected dom 0)

Would you like to share some opinions?

Regards,
Jerry
On 2013/10/16 19:10, Jan Beulich wrote:
>>>> On 16.10.13 at 06:13, jerry <jerry.lilijun@huawei.com> wrote:
>> Hi Wei Liu,
>>
>> I am doing some network performance on Xen4.1.2 and kernel 3.0, and get a 
>> crash with BUG_ON(netbk->mmap_pages[idx] != page) in netbk_gop_frag() 
>> accidentally.
>>
>> By analyzing the module drivers/xen/netback,
> 
> You aren't looking at the upstream driver, are you? If so, Wei is
> very likely the wrong addressee.
> 
> Assuming that you instead talk of the SLE11 kernel, I can only
> point out that a problem in that code was found and fixed a
> couple of months ago (resulting in the BUG_ON() you quoted not
> being there anymore), so you're simply not looking at up-to-date
> code.
> 
> Jan
> 
>> I think the reason is as 
>> follows when sending packets from VM1 to VM2:
>> 1) The two netback thread(the first for VM1 sending, second for VM2 
>> receiving) run concurrently.
>> 2) In first netback thread, it will do delayed copy from a foreign granted 
>> page to local memory when some outstanding packets have been pending too 
>> long( above half of one HZ).
>>    Then netbk->mmap_pages[idx] will be replaced with new allocated page.
>> 3) If the packets are forwarded to VM2 by virtual switch, netbk_gop_frag() 
>> will be called in second netback thread.
>>    And that function will judge whether the pages in skb frags[] is foreign 
>> in order to make sure how to do grant copy.
>> 4) If the page replacing was done after the page foreign judge in 
>> netbk_gop_frag(), the BUG will be invoked because the page from skb frags[] 
>> are different with mmap_pages[idx].
>>
>> I tried to using spin_lock to protect the page accessing, but no appropriate 
>> solutions was found.
>> How to fix this problem?  Would you like to share some opinions?
>>
>> In addition, I have tried to turn off copy_skb. Then the vif netdevice may 
>> not be released after shutting down VM,
>> that's because outstanding packets hold the reference count of the device 
>> too long for some unknown reason.
>> The reason may be that the NIC does not release packets after DMA.
>> Does anyone have met such problems? Thanks.
>>
>> Best regards,
>> Jerry
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org 
>> http://lists.xen.org/xen-devel 
> 
> 
> 
> 
> .
> 

  reply	other threads:[~2013-10-17  7:43 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-16  4:13 netback BUG_ON when using copy_skb=1 jerry
2013-10-16 11:10 ` Jan Beulich
2013-10-17  7:41   ` jerry [this message]
2013-10-17  8:00     ` Jan Beulich
2013-10-17 10:26       ` jerry
2013-10-17 12:11         ` Jan Beulich
2013-10-22  1:18           ` jerry
2013-10-22  7:11             ` Jan Beulich
2013-10-26  8:32   ` jerry
2013-10-28  7:43     ` Jan Beulich
2013-10-29  4:04       ` jerry
2013-10-28 11:43     ` Wei Liu
2013-10-31 15:17       ` Ian Campbell
2013-10-31 15:32         ` Wei Liu
2013-11-01  2:53           ` jerry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=525F94BF.6050500@huawei.com \
    --to=jerry.lilijun@huawei.com \
    --cc=JBeulich@suse.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.