From mboxrd@z Thu Jan 1 00:00:00 1970 From: jerry Subject: Re: netback BUG_ON when using copy_skb=1 Date: Thu, 17 Oct 2013 18:26:35 +0800 Message-ID: <525FBB5B.4040609@huawei.com> References: <525E125B.80100@huawei.com> <525E903602000078000FB6DF@nat28.tlf.novell.com> <525F94BF.6050500@huawei.com> <525FB55B02000078000FBAFE@nat28.tlf.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1VWknE-00021a-0m for xen-devel@lists.xenproject.org; Thu, 17 Oct 2013 10:27:00 +0000 In-Reply-To: <525FB55B02000078000FBAFE@nat28.tlf.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: Wei Liu , "qianhuibin@huawei.com" , stefano.stabellini@eu.citrix.com, xiaowei.yang@huawei.com, wangfuhai@huawei.com, qinchuanyu@huawei.com, xen-devel List-Id: xen-devel@lists.xenproject.org Hi Jan, In my test, the grant table copy error may cause that VM crash. The stack is as follows: kernel BUG at /linux/driver/redhat6.2/xen-vnif/xen-netfront.c:372! Pid: 2658, comm: iperf Not tainted 2.6.32-220.el6.x86_64 #1 Xen HVM domU RIP: 0010:[] [] xennet_tx_buf_gc+0x18a/0x1f0 [xen_netfront] RSP: 0018:ffff880004403df8 EFLAGS: 00010096 RAX: 0000000000000049 RBX: ffff8800821986e0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046 RBP: ffff880004403e48 R08: ffffffff81c00690 R09: 0000000000000080 R10: 0000000000013030 R11: 0000000000000000 R12: 000000000000003b R13: 000000000000023d R14: 0000000000000011 R15: 0000000000000011 FS: 00007fd8fd97e700(0000) GS:ffff880004400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000030270aab70 CR3: 0000000080cf4000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process iperf (pid: 2658, threadinfo ffff8800813ba000, task ffff880080d0eb00) Stack: ffff880082198020 ffff880082198f90 ffff88007f8d00c0 0000003f04415fc0 <0> ffff880004403e28 ffff880082198768 ffff880082198020 ffff8800821986e0 <0> 0000000000000282 0000000000000100 ffff880004403e78 ffffffffa0117d4c Call Trace: [] xennet_interrupt+0x4c/0xb0 [xen_netfront] [] handle_IRQ_event+0x60/0x170 [] ? ktime_get+0x63/0xe0 [] handle_edge_irq+0xde/0x180 [] __xen_evtchn_do_upcall+0x1b9/0x1f0 [] xen_evtchn_do_upcall+0x2f/0x50 [] xen_hvm_callback_vector+0x13/0x20 The BUG code in xen-netfront.c xennet_tx_buf_gc() is: if (unlikely(gnttab_query_foreign_access( np->grant_tx_ref[id]) != 0)) { printk(KERN_ALERT "xennet_tx_buf_gc: warning " "-- grant still in use by backend " "domain.\n"); BUG(); In my guess the reason may be as follows: 1) XEN: The function _set_status() called in hypercall __gnttab_copy() and __acquire_grant_for_copy() is executed failed and the grant ref is not ended. So GTF_reading bit cannot be cleared. 2) Netfront: this module invokes a BUG when it checks the GTF_reading bit is still set. Regards, Jerry On 2013/10/17 16:00, Jan Beulich wrote: >>>> On 17.10.13 at 09:41, jerry wrote: >> But there may be still concurrency problems in my test. >> If the page replacing in copy_pending_req() was done after >> netif_get_page_ext() in netbk_gop_frag(), copy_gop->flags is wrongly marked >> with GNTCOPY_source_gref. >> Here the memory of that page in skb has been replaced with Dom0 local >> memory, so the later HYPERVISOR_multicall() with GNTTABOP_copy in >> netbk_rx_actions() will get errors. >> The messages is shown as: >> >> (XEN) grant_table.c:305:d0 Bad flags (0) or dom (0). (expected dom 0) >> >> Would you like to share some opinions? > > At a first glance that seems possible, but the question is - does it > cause any problems other than the quoted message to be issued > (and the problematic packet getting re-transmitted)? I'm asking > mainly because fixing this would appear to imply adding locking to > these paths - with the risk of adversely affecting performance. > > Jan > > >