From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: VM started to hang after a system update Date: Mon, 29 Jul 2013 13:58:54 +0300 Message-ID: <20130729105854.GA6847@redhat.com> References: <51F62F89.40301@semihalf.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: kvm@vger.kernel.org To: Artur Samborski Return-path: Received: from mx1.redhat.com ([209.132.183.28]:10781 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753675Ab3G2K5b (ORCPT ); Mon, 29 Jul 2013 06:57:31 -0400 Content-Disposition: inline In-Reply-To: <51F62F89.40301@semihalf.com> Sender: kvm-owner@vger.kernel.org List-ID: https://bugzilla.redhat.com/show_bug.cgi?id=975065 ? On Mon, Jul 29, 2013 at 11:02:01AM +0200, Artur Samborski wrote: > Hello, > > we have another problem with KVM on our production machines. > > After updating the OS (Fedora Core 18) our KVM virtual machines > started to crash. Test have shown that this crashes are associated > with occurrence of a large load of network traffic. > > When the virtual machine hangs, this message appears in the KVM-host > kernel (3.9.9-201.fc18.x86_64) log: > > > BUG: unable to handle kernel NULL pointer dereference at (null) > IP: [] put_page+0x11/0x60 > PGD 0 > Oops: 0000 [#1] SMP > Modules linked in: binfmt_misc ip6table_filter ip6_tables > ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat > nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack > xt_CHECKSUM iptable_mangle bridge stp llc be2iscsi iscsi_boot_sysfs > bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser > rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp > libiscsi_tcp libiscsi scsi_transport_iscsi e1000e iTCO_wdt > iTCO_vendor_support ptp pps_core vhost_net ses ioatdma dcdbas mperf > shpchp i7core_edac lpc_ich edac_core dca mfd_core tun macvtap > macvlan enclosure bnx2 coretemp crc32c_intel serio_raw microcode > kvm_intel acpi_power_meter kvm ipmi_devintf ipmi_si ipmi_msghandler > mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core megaraid_sas > wmi > CPU 2 > Pid: 7524, comm: vhost-7521 Tainted: G W I > 3.9.9-201.fc18.x86_64 #1 Dell Inc. PowerEdge R610/0K399H > RIP: 0010:[] [] put_page+0x11/0x60 > RSP: 0018:ffff880427a31c28 EFLAGS: 00010296 > RAX: ffff88065d8e16c0 RBX: 0000000000000000 RCX: 0000000000000006 > RDX: 0000000000000150 RSI: 0000000000000000 RDI: 0000000000000000 > RBP: ffff880427a31c38 R08: 000000000000000a R09: 00000000000006f7 > R10: 0000000000000000 R11: 00000000000006f6 R12: ffff8808273c9d00 > R13: ffffffffa0180237 R14: ffff88067c8d43d8 R15: ffff8808273c9d00 > FS: 0000000000000000(0000) GS:ffff88083fc20000(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000000000000000 CR3: 000000082864a000 CR4: 00000000000027e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process vhost-7521 (pid: 7524, threadinfo ffff880427a30000, task > ffff880427f94650) > Stack: > ffffea001c159340 0000000000000013 ffff880427a31c58 ffffffff8154676f > ffff8808273c9d00 ffff8808273c9d00 ffff880427a31c78 ffffffff8154680e > ffffea001c15b080 ffff880828f45800 ffff880427a31ca8 ffffffff815468c6 > Call Trace: > [] skb_release_data+0x8f/0x110 > [] __kfree_skb+0x1e/0xa0 > [] kfree_skb+0x36/0xa0 > [] macvtap_get_user+0x317/0x510 [macvtap] > [] macvtap_sendmsg+0x2b/0x30 [macvtap] > [] handle_tx+0x287/0x680 [vhost_net] > [] handle_tx_kick+0x15/0x20 [vhost_net] > [] vhost_worker+0xed/0x190 [vhost_net] > [] ? vhost_work_flush+0x110/0x110 [vhost_net] > [] kthread+0xc0/0xd0 > [] ? ftrace_define_fields_xen_mc_flush+0x20/0xb0 > [] ? kthread_create_on_node+0x120/0x120 > [] ret_from_fork+0x7c/0xb0 > [] ? kthread_create_on_node+0x120/0x120 > Code: 45 fc 65 48 01 04 25 70 02 01 00 c9 c3 66 66 66 66 2e 0f 1f > 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 53 48 89 fb 48 83 ec 08 > <48> f7 07 00 c0 00 00 75 34 8b 47 1c 85 c0 74 1a f0 ff 4b 1c 0f > RIP [] put_page+0x11/0x60 > RSP > CR2: 0000000000000000 > ---[ end trace cb305c3097c1de97 ]--- > > > After returning to the previously working kernel (3.7.0 -- manually > compiled from kvm git sources) - the problem still persists: > > > BUG: unable to handle kernel paging request at 0000040200000401 > IP: [] put_page+0x5/0x50 > PGD 0 > Oops: 0000 [#1] SMP > Modules linked in: binfmt_misc ip6table_filter ip6_tables > ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack > nf_conntrack xt_CHECKSUM iptable_mangle be2iscsi iscsi_boot_sysfs > bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser > rdma_cm ib_addr iw_cm ib_cm ib_sa bridge stp llc ib_mad ib_core > iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vhost_net > coretemp e1000e ioatdma tun macvtap macvlan bnx2 iTCO_wdt shpchp > crc32c_intel microcode dca ses iTCO_vendor_support lpc_ich wmi > dcdbas kvm_intel i7core_edac edac_core enclosure joydev > acpi_power_meter serio_raw pcspkr mfd_core kvm ipmi_devintf ipmi_si > ipmi_msghandler megaraid_sas > CPU 0 > Pid: 1505, comm: vhost-1502 Tainted: G W 3.7.0HYDRA_02+ > #1 Dell Inc. PowerEdge R610/0K399H > RIP: 0010:[] [] put_page+0x5/0x50 > RSP: 0018:ffff880823e6bc50 EFLAGS: 00010202 > RAX: ffff88066d34bec0 RBX: 0000000000000012 RCX: ffffea0019cf001c > RDX: 0000000000000140 RSI: 0000000000000246 RDI: 0000040200000401 > RBP: ffff880823e6bc68 R08: ffff880823e444f8 R09: 0000000000000010 > R10: 0000000000000000 R11: 00003ffffffff000 R12: ffff880827d34700 > R13: ffffffffa01371a8 R14: ffff880823e443d8 R15: ffff880827d34700 > FS: 0000000000000000(0000) GS:ffff88083fc00000(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000040200000401 CR3: 0000000825272000 CR4: 00000000000027e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process vhost-1502 (pid: 1505, threadinfo ffff880823e6a000, task > ffff880825a1c5c0) > Stack: > ffffffff81520c1f ffff880827d34700 ffff880827d34700 ffff880823e6bc88 > ffffffff81520cbe ffffea0019cf69c0 ffff88042a1df400 ffff880823e6bcb8 > ffffffff81520d76 000000000000000c ffff88042a1df400 000000000000000c > Call Trace: > [] ? skb_release_data+0x8f/0x110 > [] __kfree_skb+0x1e/0xa0 > [] kfree_skb+0x36/0xa0 > [] macvtap_get_user+0x248/0x490 [macvtap] > [] macvtap_sendmsg+0x2b/0x30 [macvtap] > [] handle_tx+0x28a/0x680 [vhost_net] > [] handle_tx_kick+0x15/0x20 [vhost_net] > [] vhost_worker+0xed/0x190 [vhost_net] > [] ? vhost_work_flush+0x110/0x110 [vhost_net] > [] kthread+0xc0/0xd0 > [] ? ftrace_define_fields_xen_mc_entry+0x50/0xf0 > [] ? kthread_create_on_node+0x120/0x120 > [] ret_from_fork+0x7c/0xb0 > [] ? kthread_create_on_node+0x120/0x120 > Code: fc 00 00 00 00 e8 ac fe ff ff 48 63 45 fc 65 48 01 04 25 b8 > 06 01 00 c9 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 > <48> f7 07 00 c0 00 00 55 48 89 e5 75 2a 8b 47 1c 85 c0 74 1e f0 > RIP [] put_page+0x5/0x50 > RSP > CR2: 0000040200000401 > > > Only after a complete rollback to the previous state of the system - > everything starts to work properly (the problem disappears). > Therefore suspicion that it may be associated with same userspace > tools? > > I will be grateful for any hints. > > Regards, > Artur Samborski > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html