Re: netback BUG_ON when using copy_skb=1

All of lore.kernel.org
 help / color / mirror / Atom feed

From: jerry <jerry.lilijun@huawei.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Wei Liu <wei.liu2@citrix.com>,
	"qianhuibin@huawei.com" <qianhuibin@huawei.com>,
	stefano.stabellini@eu.citrix.com, xiaowei.yang@huawei.com,
	wangfuhai@huawei.com, qinchuanyu@huawei.com,
	xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: netback BUG_ON when using copy_skb=1
Date: Thu, 17 Oct 2013 18:26:35 +0800	[thread overview]
Message-ID: <525FBB5B.4040609@huawei.com> (raw)
In-Reply-To: <525FB55B02000078000FBAFE@nat28.tlf.novell.com>

Hi Jan,

In my test, the grant table copy error may cause that VM crash.
The stack is as follows:
kernel BUG at /linux/driver/redhat6.2/xen-vnif/xen-netfront.c:372!
Pid: 2658, comm: iperf Not tainted 2.6.32-220.el6.x86_64 #1 Xen HVM domU
RIP: 0010:[<ffffffffa01166ca>]  [<ffffffffa01166ca>] xennet_tx_buf_gc+0x18a/0x1f0 [xen_netfront]
RSP: 0018:ffff880004403df8  EFLAGS: 00010096
RAX: 0000000000000049 RBX: ffff8800821986e0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046
RBP: ffff880004403e48 R08: ffffffff81c00690 R09: 0000000000000080
R10: 0000000000013030 R11: 0000000000000000 R12: 000000000000003b
R13: 000000000000023d R14: 0000000000000011 R15: 0000000000000011
FS:  00007fd8fd97e700(0000) GS:ffff880004400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000030270aab70 CR3: 0000000080cf4000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process iperf (pid: 2658, threadinfo ffff8800813ba000, task ffff880080d0eb00)
Stack:
 ffff880082198020 ffff880082198f90 ffff88007f8d00c0 0000003f04415fc0
<0> ffff880004403e28 ffff880082198768 ffff880082198020 ffff8800821986e0
<0> 0000000000000282 0000000000000100 ffff880004403e78 ffffffffa0117d4c
Call Trace:
 <IRQ>
 [<ffffffffa0117d4c>] xennet_interrupt+0x4c/0xb0 [xen_netfront]
 [<ffffffff810d94f0>] handle_IRQ_event+0x60/0x170
 [<ffffffff8109b8a3>] ? ktime_get+0x63/0xe0
 [<ffffffff810dbc2e>] handle_edge_irq+0xde/0x180
 [<ffffffff812fe809>] __xen_evtchn_do_upcall+0x1b9/0x1f0
 [<ffffffff812fedbf>] xen_evtchn_do_upcall+0x2f/0x50
 [<ffffffff8100c373>] xen_hvm_callback_vector+0x13/0x20

The BUG code in xen-netfront.c xennet_tx_buf_gc() is:
			if (unlikely(gnttab_query_foreign_access(
				np->grant_tx_ref[id]) != 0)) {
				printk(KERN_ALERT "xennet_tx_buf_gc: warning "
				       "-- grant still in use by backend "
				       "domain.\n");
				BUG();

In my guess the reason may be as follows:
1) XEN: The function _set_status() called in hypercall __gnttab_copy() and __acquire_grant_for_copy() is executed failed and the grant ref is not ended.
        So GTF_reading bit cannot be cleared.
2) Netfront: this module invokes a BUG when it checks the GTF_reading bit is still set.

Regards,
Jerry

On 2013/10/17 16:00, Jan Beulich wrote:
>>>> On 17.10.13 at 09:41, jerry <jerry.lilijun@huawei.com> wrote:
>> But there may be still concurrency problems in my test.
>> If the page replacing in copy_pending_req() was done after 
>> netif_get_page_ext() in netbk_gop_frag(), copy_gop->flags is wrongly marked 
>> with GNTCOPY_source_gref.
>> Here the memory of that page in skb has been replaced with Dom0 local 
>> memory, so the later HYPERVISOR_multicall() with GNTTABOP_copy in 
>> netbk_rx_actions() will get errors.
>> The messages is shown as:
>>
>> (XEN) grant_table.c:305:d0 Bad flags (0) or dom (0). (expected dom 0)
>>
>> Would you like to share some opinions?
> 
> At a first glance that seems possible, but the question is - does it
> cause any problems other than the quoted message to be issued
> (and the problematic packet getting re-transmitted)? I'm asking
> mainly because fixing this would appear to imply adding locking to
> these paths - with the risk of adversely affecting performance.
> 
> Jan
> 
> 
>

next prev parent reply	other threads:[~2013-10-17 10:27 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-16  4:13 netback BUG_ON when using copy_skb=1 jerry
2013-10-16 11:10 ` Jan Beulich
2013-10-17  7:41   ` jerry
2013-10-17  8:00     ` Jan Beulich
2013-10-17 10:26       ` jerry [this message]
2013-10-17 12:11         ` Jan Beulich
2013-10-22  1:18           ` jerry
2013-10-22  7:11             ` Jan Beulich
2013-10-26  8:32   ` jerry
2013-10-28  7:43     ` Jan Beulich
2013-10-29  4:04       ` jerry
2013-10-28 11:43     ` Wei Liu
2013-10-31 15:17       ` Ian Campbell
2013-10-31 15:32         ` Wei Liu
2013-11-01  2:53           ` jerry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=525FBB5B.4040609@huawei.com \
    --to=jerry.lilijun@huawei.com \
    --cc=JBeulich@suse.com \
    --cc=qianhuibin@huawei.com \
    --cc=qinchuanyu@huawei.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=wangfuhai@huawei.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    --cc=xiaowei.yang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.