xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Philipp Hahn <hahn@univention.de>
To: Wei Liu <wei.liu2@citrix.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
	Erik Damrose <Damrose@univention.de>,
	Ian Campbell <ian.campbell@citrix.com>,
	Zoltan Kiss <zoltan.kiss@citrix.com>
Subject: Re: RFH: Kernel OOPS in xen_netbk_rx_action / xenvif_gop_skb
Date: Wed, 18 Jun 2014 18:48:31 +0200	[thread overview]
Message-ID: <53A1C2DF.10407@univention.de> (raw)
In-Reply-To: <53923CD0.7010001@univention.de>

[-- Attachment #1: Type: text/plain, Size: 5763 bytes --]

Hello,

We are now more or less able to reproduce the OOPS within one hour by
constantly shutting down the vm and rebooting it:

> [32918.795695] XXXlan0: port 3(vif18.0) entered disabled state
> [32918.798732] BUG: unable to handle kernel paging request at ffffc90010da2188
> [32918.798823] IP: [<ffffffffa04287dc>] xen_netbk_rx_action+0x18b/0x6f0 [xen_netback]
> [32918.798911] PGD 95822067 PUD 95823067 PMD 94f47067 PTE 0
> [32918.798974] Oops: 0000 [#1] SMP
> [32918.799023] Modules linked in: xt_physdev xen_blkback xen_netback ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables xen_gntdev nfsv3 nfsv4 rpcsec_gss_krb5 nfsd nfs_acl auth_rpcgss oid_registry nfs fscache dns_resolver lockd sunrpc fuse loop xen_blkfront xen_evtchn blktap quota_v2 quota_tree xenfs xen_privcmd coretemp crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw snd_pcm gf128mul snd_timer glue_helper snd aes_x86_64 soundcore snd_page_alloc microcode tpm_tis tpm tpm_bios pcspkr lpc_ich mfd_core acpi_power_meter i7core_edac mperf serio_raw i2c_i801 evdev edac_core processor ioatdma thermal_sys ext4 jbd2 crc16 bonding bridge stp llc dm_snapshot dm_mirror dm_region_hash dm_log dm_mod sd_mod crc_t10dif hid_generic usbhid hid mptsas mptscsih mptbase scs
 i_transport_sas ehci_pci button uhci_hcd ehci_hcd usbcore usb_common igb dca i2c_algo_bit i2c_core ptp pps_core
> [32918.799958] CPU: 0 PID: 6450 Comm: netback/0 Not tainted 3.10.0-ucs58-amd64 #1 Debian 3.10.11-1.58.201405060908
> [32918.800050] Hardware name: FUJITSU PRIMERGY BX920 S2/D3030, BIOS 080015 Rev.3D94.3030 10/09/2012
> [32918.800137] task: ffff880093864880 ti: ffff88009266c000 task.ti: ffff88009266c000
> [32918.800220] RIP: e030:[<ffffffffa04287dc>]  [<ffffffffa04287dc>] xen_netbk_rx_action+0x18b/0x6f0 [xen_netback]
> [32918.800314] RSP: e02b:ffff88009266dce8  EFLAGS: 00010212
> [32918.800364] RAX: ffffc9001082dac0 RBX: ffff880004d86ac0 RCX: ffffc90010da2000
> [32918.800419] RDX: 0000000000000031 RSI: 0000000000000000 RDI: ffff880004bdd280
> [32918.800474] RBP: ffff8800932db800 R08: 0000000000000000 R09: ffff8800952f3800
> [32918.800529] R10: 0000000000007ff0 R11: ffff88009c611380 R12: ffff8800932db800
> [32918.800584] R13: ffff88009266dd58 R14: ffffc90010821000 R15: 0000000000000000
> [32918.800642] FS:  00007f2f8fdcd700(0000) GS:ffff88009c600000(0000) knlGS:0000000000000000
> [32918.800728] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [32918.800778] CR2: ffffc90010da2188 CR3: 0000000093eb0000 CR4: 0000000000002660
> [32918.800834] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [32918.800889] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [32918.800943] Stack:
> [32918.800981]  ffff880093864c60 000000008106d2af ffff88009c613ec0 ffff88009c613ec0
> [32918.801077]  0000000093864880 ffffc90010828ac0 ffffc90010821020 000000009c613ec0
> [32918.801173]  0000000000000000 0000000000000001 ffffc90010828ac0 ffffc9001082dac0
> [32918.801269] Call Trace:
> [32918.801314]  [<ffffffff813ca32d>] ? _raw_spin_lock_irqsave+0x11/0x2f
> [32918.801368]  [<ffffffffa042a033>] ? xen_netbk_kthread+0x174/0x841 [xen_netback]
> [32918.801454]  [<ffffffff8105d373>] ? wake_up_bit+0x20/0x20
> [32918.801504]  [<ffffffffa0429ebf>] ? xen_netbk_tx_build_gops+0xce8/0xce8 [xen_netback]
> [32918.801590]  [<ffffffff8105cd73>] ? kthread_freezable_should_stop+0x56/0x56
> [32918.801645]  [<ffffffffa0429ebf>] ? xen_netbk_tx_build_gops+0xce8/0xce8 [xen_netback]
> [32918.801730]  [<ffffffff8105ce1e>] ? kthread+0xab/0xb3
> [32918.801781]  [<ffffffff81003638>] ? xen_end_context_switch+0xe/0x1c
> [32918.801834]  [<ffffffff8105cd73>] ? kthread_freezable_should_stop+0x56/0x56
> [32918.801890]  [<ffffffff813cfbfc>] ? ret_from_fork+0x7c/0xb0
> [32918.801941]  [<ffffffff8105cd73>] ? kthread_freezable_should_stop+0x56/0x56
> [32918.801995] Code: 8b b3 d0 00 00 00 48 8b bb d8 00 00 00 0f b7 74 37 02 89 70 08 eb 07 c7 40 08 00 00 00 00 89 d2 c7 40 04 00 00 00 00 48 83 c2 08 <0f> b7 34 d1 89 30 c7 44 24 60 00 00 00 00 8b 44 d1 04 89 44 24
> [32918.802400] RIP  [<ffffffffa04287dc>] xen_netbk_rx_action+0x18b/0x6f0 [xen_netback]
> [32918.802486]  RSP <ffff88009266dce8>
> [32918.802529] CR2: ffffc90010da2188
> [32918.802859] ---[ end trace baf81e34c52eb41c ]---

(gdb) list *(xen_netbk_rx_action+0x18b)
0xffffffffa04287dc is in xen_netbk_rx_action
(/var/build/temp/tmp.hW3dNilayw/pbuilder/linux-3.10.11/drivers/net/xen-netback/netback
.c:611).
606                     meta->gso_size = skb_shinfo(skb)->gso_size;
607             else
608                     meta->gso_size = 0;
609
610             meta->size = 0;
611             meta->id = req->id;
612             npo->copy_off = 0;
613             npo->copy_gref = req->gref;
614
615             data = skb->data;


After more debugging today I think something like this happens:

1. The VM is receiving packets through bonding + bridge + netback +
netfront.

2. For some unknown reason at least one packet remains in the rx queue
and is not delivered to the domU immediately by netback.

3. The VM finishes shutting down.

4. The shared ring between dom0 and domU is freed.

5. then xen-netback continues processing the pending requests and tries
to put the packet into the now already released shared ring.


>From reading the attached disassembly I guess, that
 AX = &meta
 CX = &rx->string
 DX =~ rx.req_cons
 CR2 = &req->id
where
 CX + DX * sizeof(union struct xen_netif_rx_{request,response})=8 = CR2


Any additional ideas or insight is appreciated.

FYI: The host has only a single CPU and is running >=2 VMs so far.

>> There's one more patch that you can pick up from 3.10.y tree. I doubt it
>> will make much difference though.

Which patch are you referring to?

Sincerely
Philipp

[-- Attachment #2: xen-netback.s --]
[-- Type: text/plain, Size: 5282 bytes --]

drivers/net/xen-netback/netback.c:582
 * frontend-side LRO).
 */
static int netbk_gop_skb(struct sk_buff *skb,
			 struct netrx_pending_operations *npo)
{
	struct xenvif *vif = netdev_priv(skb->dev);
     721:	48 81 c5 00 08 00 00 	add    $0x800,%rbp
drivers/net/xen-netback/netback.c:594
	int old_meta_prod;

	old_meta_prod = npo->meta_prod;

	/* Set up a GSO prefix descriptor, if necessary */
	if (skb_shinfo(skb)->gso_size && vif->gso_prefix) {
     728:	66 83 7c 02 02 00    	cmpw   $0x0,0x2(%rdx,%rax,1)
     72e:	74 53                	je     783 <xen_netbk_rx_action+0x132>
     730:	f6 45 60 04          	testb  $0x4,0x60(%rbp)
     734:	74 4d                	je     783 <xen_netbk_rx_action+0x132>
drivers/net/xen-netback/netback.c:595
		req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
     736:	8b 55 4c             	mov    0x4c(%rbp),%edx
     739:	48 8b 75 58          	mov    0x58(%rbp),%rsi
     73d:	8b 7d 50             	mov    0x50(%rbp),%edi
     740:	8d 42 01             	lea    0x1(%rdx),%eax
     743:	ff cf                	dec    %edi
     745:	89 45 4c             	mov    %eax,0x4c(%rbp)
drivers/net/xen-netback/netback.c:596
		meta = npo->meta + npo->meta_prod++;
     748:	8b 4c 24 48          	mov    0x48(%rsp),%ecx
drivers/net/xen-netback/netback.c:599
		meta->gso_size = skb_shinfo(skb)->gso_size;
		meta->size = 0;
		meta->id = req->id;
     74c:	21 fa                	and    %edi,%edx
drivers/net/xen-netback/netback.c:596
	old_meta_prod = npo->meta_prod;

	/* Set up a GSO prefix descriptor, if necessary */
	if (skb_shinfo(skb)->gso_size && vif->gso_prefix) {
		req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
		meta = npo->meta + npo->meta_prod++;
     74e:	89 c8                	mov    %ecx,%eax
     750:	ff c1                	inc    %ecx
     752:	89 4c 24 48          	mov    %ecx,0x48(%rsp)
drivers/net/xen-netback/netback.c:597
		meta->gso_size = skb_shinfo(skb)->gso_size;
     756:	8b 8b d0 00 00 00    	mov    0xd0(%rbx),%ecx
     75c:	4c 8b 83 d8 00 00 00 	mov    0xd8(%rbx),%r8
drivers/net/xen-netback/netback.c:596
	old_meta_prod = npo->meta_prod;

	/* Set up a GSO prefix descriptor, if necessary */
	if (skb_shinfo(skb)->gso_size && vif->gso_prefix) {
		req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
		meta = npo->meta + npo->meta_prod++;
     763:	48 6b c0 0c          	imul   $0xc,%rax,%rax
     767:	48 03 44 24 58       	add    0x58(%rsp),%rax
drivers/net/xen-netback/netback.c:597
		meta->gso_size = skb_shinfo(skb)->gso_size;
     76c:	41 0f b7 4c 08 02    	movzwl 0x2(%r8,%rcx,1),%ecx
drivers/net/xen-netback/netback.c:598
		meta->size = 0;
     772:	c7 40 04 00 00 00 00 	movl   $0x0,0x4(%rax)
drivers/net/xen-netback/netback.c:597

	/* Set up a GSO prefix descriptor, if necessary */
	if (skb_shinfo(skb)->gso_size && vif->gso_prefix) {
		req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
		meta = npo->meta + npo->meta_prod++;
		meta->gso_size = skb_shinfo(skb)->gso_size;
     779:	89 48 08             	mov    %ecx,0x8(%rax)
drivers/net/xen-netback/netback.c:599
		meta->size = 0;
		meta->id = req->id;
     77c:	0f b7 54 d6 40       	movzwl 0x40(%rsi,%rdx,8),%edx
     781:	89 10                	mov    %edx,(%rax)
drivers/net/xen-netback/netback.c:602
	}

	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
     783:	8b 55 50             	mov    0x50(%rbp),%edx
     786:	8b 45 4c             	mov    0x4c(%rbp),%eax
     789:	48 8b 4d 58          	mov    0x58(%rbp),%rcx
     78d:	ff ca                	dec    %edx
     78f:	21 c2                	and    %eax,%edx
     791:	ff c0                	inc    %eax
     793:	89 45 4c             	mov    %eax,0x4c(%rbp)
drivers/net/xen-netback/netback.c:603
	meta = npo->meta + npo->meta_prod++;
     796:	8b 74 24 48          	mov    0x48(%rsp),%esi
     79a:	89 f0                	mov    %esi,%eax
     79c:	ff c6                	inc    %esi
     79e:	48 6b c0 0c          	imul   $0xc,%rax,%rax
     7a2:	89 74 24 48          	mov    %esi,0x48(%rsp)
     7a6:	48 03 44 24 58       	add    0x58(%rsp),%rax
drivers/net/xen-netback/netback.c:605

	if (!vif->gso_prefix)
     7ab:	f6 45 60 04          	testb  $0x4,0x60(%rbp)
     7af:	75 17                	jne    7c8 <xen_netbk_rx_action+0x177>
drivers/net/xen-netback/netback.c:606
		meta->gso_size = skb_shinfo(skb)->gso_size;
     7b1:	8b b3 d0 00 00 00    	mov    0xd0(%rbx),%esi
     7b7:	48 8b bb d8 00 00 00 	mov    0xd8(%rbx),%rdi
     7be:	0f b7 74 37 02       	movzwl 0x2(%rdi,%rsi,1),%esi
     7c3:	89 70 08             	mov    %esi,0x8(%rax)
     7c6:	eb 07                	jmp    7cf <xen_netbk_rx_action+0x17e>
drivers/net/xen-netback/netback.c:608
	else
		meta->gso_size = 0;
     7c8:	c7 40 08 00 00 00 00 	movl   $0x0,0x8(%rax)
drivers/net/xen-netback/netback.c:611

	meta->size = 0;
	meta->id = req->id;
     7cf:	89 d2                	mov    %edx,%edx
drivers/net/xen-netback/netback.c:610
	if (!vif->gso_prefix)
		meta->gso_size = skb_shinfo(skb)->gso_size;
	else
		meta->gso_size = 0;

	meta->size = 0;
     7d1:	c7 40 04 00 00 00 00 	movl   $0x0,0x4(%rax)
drivers/net/xen-netback/netback.c:611
	meta->id = req->id;
     7d8:	48 83 c2 08          	add    $0x8,%rdx
     7dc:	0f b7 34 d1          	movzwl (%rcx,%rdx,8),%esi
     7e0:	89 30                	mov    %esi,(%rax)

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2014-06-18 16:48 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-06 10:26 RFH: Kernel OOPS in xen_netbk_rx_action / xenvif_gop_skb Philipp Hahn
2014-06-06 10:58 ` Wei Liu
2014-06-06 22:12   ` Philipp Hahn
2014-06-18 16:48     ` Philipp Hahn [this message]
2014-06-19 14:12       ` Wei Liu
2014-06-19 14:35         ` David Vrabel
2014-06-19 14:41           ` Wei Liu
2014-06-23 14:56         ` Philipp Hahn
2014-06-27  8:42           ` Philipp Hahn
2014-06-27 17:48             ` Philipp Hahn
2014-06-27 18:24               ` Philipp Hahn
2014-07-02  7:45                 ` [PATCH] " Philipp Hahn
2014-07-10 12:41                   ` Wei Liu
     [not found]                   ` <20140710124122.GA2381@zion.uk.xensource.com>
2014-07-11  9:41                     ` Philipp Hahn
     [not found]                     ` <53BFB142.7050201@univention.de>
2014-07-11  9:53                       ` Wei Liu
2014-07-11 10:32                       ` Wei Liu
     [not found]                       ` <20140711103236.GB12584@zion.uk.xensource.com>
2014-07-11 11:02                         ` Philipp Hahn
     [not found]                         ` <53BFC43A.4080709@univention.de>
2014-07-11 11:16                           ` Wei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53A1C2DF.10407@univention.de \
    --to=hahn@univention.de \
    --cc=Damrose@univention.de \
    --cc=ian.campbell@citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    --cc=zoltan.kiss@citrix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).