From mboxrd@z Thu Jan 1 00:00:00 1970 From: Philipp Hahn Subject: Re: RFH: Kernel OOPS in xen_netbk_rx_action / xenvif_gop_skb Date: Wed, 18 Jun 2014 18:48:31 +0200 Message-ID: <53A1C2DF.10407@univention.de> References: <5391976F.8020800@univention.de> <20140606105804.GD11959@zion.uk.xensource.com> <53923CD0.7010001@univention.de> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------010904030409090802000209" Return-path: Received: from mail6.bemta4.messagelabs.com ([85.158.143.247]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1WxJ2K-0006Xd-L0 for xen-devel@lists.xenproject.org; Wed, 18 Jun 2014 16:48:36 +0000 In-Reply-To: <53923CD0.7010001@univention.de> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Wei Liu Cc: xen-devel , Erik Damrose , Ian Campbell , Zoltan Kiss List-Id: xen-devel@lists.xenproject.org This is a multi-part message in MIME format. --------------010904030409090802000209 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hello, We are now more or less able to reproduce the OOPS within one hour by constantly shutting down the vm and rebooting it: > [32918.795695] XXXlan0: port 3(vif18.0) entered disabled state > [32918.798732] BUG: unable to handle kernel paging request at ffffc90010da2188 > [32918.798823] IP: [] xen_netbk_rx_action+0x18b/0x6f0 [xen_netback] > [32918.798911] PGD 95822067 PUD 95823067 PMD 94f47067 PTE 0 > [32918.798974] Oops: 0000 [#1] SMP > [32918.799023] Modules linked in: xt_physdev xen_blkback xen_netback ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables xen_gntdev nfsv3 nfsv4 rpcsec_gss_krb5 nfsd nfs_acl auth_rpcgss oid_registry nfs fscache dns_resolver lockd sunrpc fuse loop xen_blkfront xen_evtchn blktap quota_v2 quota_tree xenfs xen_privcmd coretemp crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw snd_pcm gf128mul snd_timer glue_helper snd aes_x86_64 soundcore snd_page_alloc microcode tpm_tis tpm tpm_bios pcspkr lpc_ich mfd_core acpi_power_meter i7core_edac mperf serio_raw i2c_i801 evdev edac_core processor ioatdma thermal_sys ext4 jbd2 crc16 bonding bridge stp llc dm_snapshot dm_mirror dm_region_hash dm_log dm_mod sd_mod crc_t10dif hid_generic usbhid hid mptsas mptscsih mptbase scs i_transport_sas ehci_pci button uhci_hcd ehci_hcd usbcore usb_common igb dca i2c_algo_bit i2c_core ptp pps_core > [32918.799958] CPU: 0 PID: 6450 Comm: netback/0 Not tainted 3.10.0-ucs58-amd64 #1 Debian 3.10.11-1.58.201405060908 > [32918.800050] Hardware name: FUJITSU PRIMERGY BX920 S2/D3030, BIOS 080015 Rev.3D94.3030 10/09/2012 > [32918.800137] task: ffff880093864880 ti: ffff88009266c000 task.ti: ffff88009266c000 > [32918.800220] RIP: e030:[] [] xen_netbk_rx_action+0x18b/0x6f0 [xen_netback] > [32918.800314] RSP: e02b:ffff88009266dce8 EFLAGS: 00010212 > [32918.800364] RAX: ffffc9001082dac0 RBX: ffff880004d86ac0 RCX: ffffc90010da2000 > [32918.800419] RDX: 0000000000000031 RSI: 0000000000000000 RDI: ffff880004bdd280 > [32918.800474] RBP: ffff8800932db800 R08: 0000000000000000 R09: ffff8800952f3800 > [32918.800529] R10: 0000000000007ff0 R11: ffff88009c611380 R12: ffff8800932db800 > [32918.800584] R13: ffff88009266dd58 R14: ffffc90010821000 R15: 0000000000000000 > [32918.800642] FS: 00007f2f8fdcd700(0000) GS:ffff88009c600000(0000) knlGS:0000000000000000 > [32918.800728] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > [32918.800778] CR2: ffffc90010da2188 CR3: 0000000093eb0000 CR4: 0000000000002660 > [32918.800834] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [32918.800889] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [32918.800943] Stack: > [32918.800981] ffff880093864c60 000000008106d2af ffff88009c613ec0 ffff88009c613ec0 > [32918.801077] 0000000093864880 ffffc90010828ac0 ffffc90010821020 000000009c613ec0 > [32918.801173] 0000000000000000 0000000000000001 ffffc90010828ac0 ffffc9001082dac0 > [32918.801269] Call Trace: > [32918.801314] [] ? _raw_spin_lock_irqsave+0x11/0x2f > [32918.801368] [] ? xen_netbk_kthread+0x174/0x841 [xen_netback] > [32918.801454] [] ? wake_up_bit+0x20/0x20 > [32918.801504] [] ? xen_netbk_tx_build_gops+0xce8/0xce8 [xen_netback] > [32918.801590] [] ? kthread_freezable_should_stop+0x56/0x56 > [32918.801645] [] ? xen_netbk_tx_build_gops+0xce8/0xce8 [xen_netback] > [32918.801730] [] ? kthread+0xab/0xb3 > [32918.801781] [] ? xen_end_context_switch+0xe/0x1c > [32918.801834] [] ? kthread_freezable_should_stop+0x56/0x56 > [32918.801890] [] ? ret_from_fork+0x7c/0xb0 > [32918.801941] [] ? kthread_freezable_should_stop+0x56/0x56 > [32918.801995] Code: 8b b3 d0 00 00 00 48 8b bb d8 00 00 00 0f b7 74 37 02 89 70 08 eb 07 c7 40 08 00 00 00 00 89 d2 c7 40 04 00 00 00 00 48 83 c2 08 <0f> b7 34 d1 89 30 c7 44 24 60 00 00 00 00 8b 44 d1 04 89 44 24 > [32918.802400] RIP [] xen_netbk_rx_action+0x18b/0x6f0 [xen_netback] > [32918.802486] RSP > [32918.802529] CR2: ffffc90010da2188 > [32918.802859] ---[ end trace baf81e34c52eb41c ]--- (gdb) list *(xen_netbk_rx_action+0x18b) 0xffffffffa04287dc is in xen_netbk_rx_action (/var/build/temp/tmp.hW3dNilayw/pbuilder/linux-3.10.11/drivers/net/xen-netback/netback .c:611). 606 meta->gso_size = skb_shinfo(skb)->gso_size; 607 else 608 meta->gso_size = 0; 609 610 meta->size = 0; 611 meta->id = req->id; 612 npo->copy_off = 0; 613 npo->copy_gref = req->gref; 614 615 data = skb->data; After more debugging today I think something like this happens: 1. The VM is receiving packets through bonding + bridge + netback + netfront. 2. For some unknown reason at least one packet remains in the rx queue and is not delivered to the domU immediately by netback. 3. The VM finishes shutting down. 4. The shared ring between dom0 and domU is freed. 5. then xen-netback continues processing the pending requests and tries to put the packet into the now already released shared ring. >>From reading the attached disassembly I guess, that AX = &meta CX = &rx->string DX =~ rx.req_cons CR2 = &req->id where CX + DX * sizeof(union struct xen_netif_rx_{request,response})=8 = CR2 Any additional ideas or insight is appreciated. FYI: The host has only a single CPU and is running >=2 VMs so far. >> There's one more patch that you can pick up from 3.10.y tree. I doubt it >> will make much difference though. Which patch are you referring to? Sincerely Philipp --------------010904030409090802000209 Content-Type: text/plain; charset=UTF-8; name="xen-netback.s" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="xen-netback.s" drivers/net/xen-netback/netback.c:582 * frontend-side LRO). */ static int netbk_gop_skb(struct sk_buff *skb, struct netrx_pending_operations *npo) { struct xenvif *vif = netdev_priv(skb->dev); 721: 48 81 c5 00 08 00 00 add $0x800,%rbp drivers/net/xen-netback/netback.c:594 int old_meta_prod; old_meta_prod = npo->meta_prod; /* Set up a GSO prefix descriptor, if necessary */ if (skb_shinfo(skb)->gso_size && vif->gso_prefix) { 728: 66 83 7c 02 02 00 cmpw $0x0,0x2(%rdx,%rax,1) 72e: 74 53 je 783 730: f6 45 60 04 testb $0x4,0x60(%rbp) 734: 74 4d je 783 drivers/net/xen-netback/netback.c:595 req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++); 736: 8b 55 4c mov 0x4c(%rbp),%edx 739: 48 8b 75 58 mov 0x58(%rbp),%rsi 73d: 8b 7d 50 mov 0x50(%rbp),%edi 740: 8d 42 01 lea 0x1(%rdx),%eax 743: ff cf dec %edi 745: 89 45 4c mov %eax,0x4c(%rbp) drivers/net/xen-netback/netback.c:596 meta = npo->meta + npo->meta_prod++; 748: 8b 4c 24 48 mov 0x48(%rsp),%ecx drivers/net/xen-netback/netback.c:599 meta->gso_size = skb_shinfo(skb)->gso_size; meta->size = 0; meta->id = req->id; 74c: 21 fa and %edi,%edx drivers/net/xen-netback/netback.c:596 old_meta_prod = npo->meta_prod; /* Set up a GSO prefix descriptor, if necessary */ if (skb_shinfo(skb)->gso_size && vif->gso_prefix) { req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++); meta = npo->meta + npo->meta_prod++; 74e: 89 c8 mov %ecx,%eax 750: ff c1 inc %ecx 752: 89 4c 24 48 mov %ecx,0x48(%rsp) drivers/net/xen-netback/netback.c:597 meta->gso_size = skb_shinfo(skb)->gso_size; 756: 8b 8b d0 00 00 00 mov 0xd0(%rbx),%ecx 75c: 4c 8b 83 d8 00 00 00 mov 0xd8(%rbx),%r8 drivers/net/xen-netback/netback.c:596 old_meta_prod = npo->meta_prod; /* Set up a GSO prefix descriptor, if necessary */ if (skb_shinfo(skb)->gso_size && vif->gso_prefix) { req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++); meta = npo->meta + npo->meta_prod++; 763: 48 6b c0 0c imul $0xc,%rax,%rax 767: 48 03 44 24 58 add 0x58(%rsp),%rax drivers/net/xen-netback/netback.c:597 meta->gso_size = skb_shinfo(skb)->gso_size; 76c: 41 0f b7 4c 08 02 movzwl 0x2(%r8,%rcx,1),%ecx drivers/net/xen-netback/netback.c:598 meta->size = 0; 772: c7 40 04 00 00 00 00 movl $0x0,0x4(%rax) drivers/net/xen-netback/netback.c:597 /* Set up a GSO prefix descriptor, if necessary */ if (skb_shinfo(skb)->gso_size && vif->gso_prefix) { req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++); meta = npo->meta + npo->meta_prod++; meta->gso_size = skb_shinfo(skb)->gso_size; 779: 89 48 08 mov %ecx,0x8(%rax) drivers/net/xen-netback/netback.c:599 meta->size = 0; meta->id = req->id; 77c: 0f b7 54 d6 40 movzwl 0x40(%rsi,%rdx,8),%edx 781: 89 10 mov %edx,(%rax) drivers/net/xen-netback/netback.c:602 } req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++); 783: 8b 55 50 mov 0x50(%rbp),%edx 786: 8b 45 4c mov 0x4c(%rbp),%eax 789: 48 8b 4d 58 mov 0x58(%rbp),%rcx 78d: ff ca dec %edx 78f: 21 c2 and %eax,%edx 791: ff c0 inc %eax 793: 89 45 4c mov %eax,0x4c(%rbp) drivers/net/xen-netback/netback.c:603 meta = npo->meta + npo->meta_prod++; 796: 8b 74 24 48 mov 0x48(%rsp),%esi 79a: 89 f0 mov %esi,%eax 79c: ff c6 inc %esi 79e: 48 6b c0 0c imul $0xc,%rax,%rax 7a2: 89 74 24 48 mov %esi,0x48(%rsp) 7a6: 48 03 44 24 58 add 0x58(%rsp),%rax drivers/net/xen-netback/netback.c:605 if (!vif->gso_prefix) 7ab: f6 45 60 04 testb $0x4,0x60(%rbp) 7af: 75 17 jne 7c8 drivers/net/xen-netback/netback.c:606 meta->gso_size = skb_shinfo(skb)->gso_size; 7b1: 8b b3 d0 00 00 00 mov 0xd0(%rbx),%esi 7b7: 48 8b bb d8 00 00 00 mov 0xd8(%rbx),%rdi 7be: 0f b7 74 37 02 movzwl 0x2(%rdi,%rsi,1),%esi 7c3: 89 70 08 mov %esi,0x8(%rax) 7c6: eb 07 jmp 7cf drivers/net/xen-netback/netback.c:608 else meta->gso_size = 0; 7c8: c7 40 08 00 00 00 00 movl $0x0,0x8(%rax) drivers/net/xen-netback/netback.c:611 meta->size = 0; meta->id = req->id; 7cf: 89 d2 mov %edx,%edx drivers/net/xen-netback/netback.c:610 if (!vif->gso_prefix) meta->gso_size = skb_shinfo(skb)->gso_size; else meta->gso_size = 0; meta->size = 0; 7d1: c7 40 04 00 00 00 00 movl $0x0,0x4(%rax) drivers/net/xen-netback/netback.c:611 meta->id = req->id; 7d8: 48 83 c2 08 add $0x8,%rdx 7dc: 0f b7 34 d1 movzwl (%rcx,%rdx,8),%esi 7e0: 89 30 mov %esi,(%rax) --------------010904030409090802000209 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --------------010904030409090802000209--