VM started to hang after a system update

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* VM started to hang after a system update
@ 2013-07-29  9:02 Artur Samborski
  2013-07-29 10:58 ` Michael S. Tsirkin
  0 siblings, 1 reply; 2+ messages in thread
From: Artur Samborski @ 2013-07-29  9:02 UTC (permalink / raw)
  To: kvm

Hello,

we have another problem with KVM on our production machines.

After updating the OS (Fedora Core 18) our KVM virtual machines started 
to crash. Test have shown that this crashes are associated with 
occurrence of a large load of network traffic.

When the virtual machine hangs, this message appears in the KVM-host 
kernel (3.9.9-201.fc18.x86_64) log:


  BUG: unable to handle kernel NULL pointer dereference at           (null)
  IP: [<ffffffff81141af1>] put_page+0x11/0x60
  PGD 0
  Oops: 0000 [#1] SMP
  Modules linked in: binfmt_misc ip6table_filter ip6_tables ebtable_nat 
ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 
nf_defrag_ipv4 xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle 
bridge stp llc be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 
cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa 
ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
e1000e iTCO_wdt iTCO_vendor_support ptp pps_core vhost_net ses ioatdma 
dcdbas mperf shpchp i7core_edac lpc_ich edac_core dca mfd_core tun 
macvtap macvlan enclosure bnx2 coretemp crc32c_intel serio_raw microcode 
kvm_intel acpi_power_meter kvm ipmi_devintf ipmi_si ipmi_msghandler 
mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core megaraid_sas wmi
  CPU 2
  Pid: 7524, comm: vhost-7521 Tainted: G        W I 
3.9.9-201.fc18.x86_64 #1 Dell Inc. PowerEdge R610/0K399H
  RIP: 0010:[<ffffffff81141af1>]  [<ffffffff81141af1>] put_page+0x11/0x60
  RSP: 0018:ffff880427a31c28  EFLAGS: 00010296
  RAX: ffff88065d8e16c0 RBX: 0000000000000000 RCX: 0000000000000006
  RDX: 0000000000000150 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffff880427a31c38 R08: 000000000000000a R09: 00000000000006f7
  R10: 0000000000000000 R11: 00000000000006f6 R12: ffff8808273c9d00
  R13: ffffffffa0180237 R14: ffff88067c8d43d8 R15: ffff8808273c9d00
  FS:  0000000000000000(0000) GS:ffff88083fc20000(0000) 
knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000000 CR3: 000000082864a000 CR4: 00000000000027e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process vhost-7521 (pid: 7524, threadinfo ffff880427a30000, task 
ffff880427f94650)
  Stack:
   ffffea001c159340 0000000000000013 ffff880427a31c58 ffffffff8154676f
   ffff8808273c9d00 ffff8808273c9d00 ffff880427a31c78 ffffffff8154680e
   ffffea001c15b080 ffff880828f45800 ffff880427a31ca8 ffffffff815468c6
  Call Trace:
   [<ffffffff8154676f>] skb_release_data+0x8f/0x110
   [<ffffffff8154680e>] __kfree_skb+0x1e/0xa0
   [<ffffffff815468c6>] kfree_skb+0x36/0xa0
   [<ffffffffa0180237>] macvtap_get_user+0x317/0x510 [macvtap]
   [<ffffffffa018045b>] macvtap_sendmsg+0x2b/0x30 [macvtap]
   [<ffffffffa0258db7>] handle_tx+0x287/0x680 [vhost_net]
   [<ffffffffa02591e5>] handle_tx_kick+0x15/0x20 [vhost_net]
   [<ffffffffa025595d>] vhost_worker+0xed/0x190 [vhost_net]
   [<ffffffffa0255870>] ? vhost_work_flush+0x110/0x110 [vhost_net]
   [<ffffffff81082ba0>] kthread+0xc0/0xd0
   [<ffffffff81010000>] ? ftrace_define_fields_xen_mc_flush+0x20/0xb0
   [<ffffffff81082ae0>] ? kthread_create_on_node+0x120/0x120
   [<ffffffff8166af2c>] ret_from_fork+0x7c/0xb0
   [<ffffffff81082ae0>] ? kthread_create_on_node+0x120/0x120
  Code: 45 fc 65 48 01 04 25 70 02 01 00 c9 c3 66 66 66 66 2e 0f 1f 84 
00 00 00 00 00 66 66 66 66 90 55 48 89 e5 53 48 89 fb 48 83 ec 08 <48> 
f7 07 00 c0 00 00 75 34 8b 47 1c 85 c0 74 1a f0 ff 4b 1c 0f
  RIP  [<ffffffff81141af1>] put_page+0x11/0x60
   RSP <ffff880427a31c28>
  CR2: 0000000000000000
  ---[ end trace cb305c3097c1de97 ]---


After returning to the previously working kernel (3.7.0 -- manually 
compiled from kvm git sources) - the problem still persists:


  BUG: unable to handle kernel paging request at 0000040200000401
  IP: [<ffffffff8113e445>] put_page+0x5/0x50
  PGD 0
  Oops: 0000 [#1] SMP
  Modules linked in: binfmt_misc ip6table_filter ip6_tables ebtable_nat 
ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
xt_CHECKSUM iptable_mangle be2iscsi iscsi_boot_sysfs bnx2i cnic uio 
cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm 
ib_cm ib_sa bridge stp llc ib_mad ib_core iscsi_tcp libiscsi_tcp 
libiscsi scsi_transport_iscsi vhost_net coretemp e1000e ioatdma tun 
macvtap macvlan bnx2 iTCO_wdt shpchp crc32c_intel microcode dca ses 
iTCO_vendor_support lpc_ich wmi dcdbas kvm_intel i7core_edac edac_core 
enclosure joydev acpi_power_meter serio_raw pcspkr mfd_core kvm 
ipmi_devintf ipmi_si ipmi_msghandler megaraid_sas
  CPU 0
  Pid: 1505, comm: vhost-1502 Tainted: G        W    3.7.0HYDRA_02+ #1 
Dell Inc. PowerEdge R610/0K399H
  RIP: 0010:[<ffffffff8113e445>]  [<ffffffff8113e445>] put_page+0x5/0x50
  RSP: 0018:ffff880823e6bc50  EFLAGS: 00010202
  RAX: ffff88066d34bec0 RBX: 0000000000000012 RCX: ffffea0019cf001c
  RDX: 0000000000000140 RSI: 0000000000000246 RDI: 0000040200000401
  RBP: ffff880823e6bc68 R08: ffff880823e444f8 R09: 0000000000000010
  R10: 0000000000000000 R11: 00003ffffffff000 R12: ffff880827d34700
  R13: ffffffffa01371a8 R14: ffff880823e443d8 R15: ffff880827d34700
  FS:  0000000000000000(0000) GS:ffff88083fc00000(0000) 
knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000040200000401 CR3: 0000000825272000 CR4: 00000000000027e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process vhost-1502 (pid: 1505, threadinfo ffff880823e6a000, task 
ffff880825a1c5c0)
  Stack:
   ffffffff81520c1f ffff880827d34700 ffff880827d34700 ffff880823e6bc88
   ffffffff81520cbe ffffea0019cf69c0 ffff88042a1df400 ffff880823e6bcb8
   ffffffff81520d76 000000000000000c ffff88042a1df400 000000000000000c
  Call Trace:
   [<ffffffff81520c1f>] ? skb_release_data+0x8f/0x110
   [<ffffffff81520cbe>] __kfree_skb+0x1e/0xa0
   [<ffffffff81520d76>] kfree_skb+0x36/0xa0
   [<ffffffffa01371a8>] macvtap_get_user+0x248/0x490 [macvtap]
   [<ffffffffa013741b>] macvtap_sendmsg+0x2b/0x30 [macvtap]
   [<ffffffffa0165d2a>] handle_tx+0x28a/0x680 [vhost_net]
   [<ffffffffa0166155>] handle_tx_kick+0x15/0x20 [vhost_net]
   [<ffffffffa016295d>] vhost_worker+0xed/0x190 [vhost_net]
   [<ffffffffa0162870>] ? vhost_work_flush+0x110/0x110 [vhost_net]
   [<ffffffff81081750>] kthread+0xc0/0xd0
   [<ffffffff81010000>] ? ftrace_define_fields_xen_mc_entry+0x50/0xf0
   [<ffffffff81081690>] ? kthread_create_on_node+0x120/0x120
   [<ffffffff8163fdac>] ret_from_fork+0x7c/0xb0
   [<ffffffff81081690>] ? kthread_create_on_node+0x120/0x120
  Code: fc 00 00 00 00 e8 ac fe ff ff 48 63 45 fc 65 48 01 04 25 b8 06 
01 00 c9 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> 
f7 07 00 c0 00 00 55 48 89 e5 75 2a 8b 47 1c 85 c0 74 1e f0
  RIP  [<ffffffff8113e445>] put_page+0x5/0x50
   RSP <ffff880823e6bc50>
  CR2: 0000040200000401


Only after a complete rollback to the previous state of the system - 
everything starts to work properly (the problem disappears). Therefore 
suspicion that it may be associated with same userspace tools?

I will be grateful for any hints.

Regards,
Artur Samborski

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: VM started to hang after a system update
  2013-07-29  9:02 VM started to hang after a system update Artur Samborski
@ 2013-07-29 10:58 ` Michael S. Tsirkin
  0 siblings, 0 replies; 2+ messages in thread
From: Michael S. Tsirkin @ 2013-07-29 10:58 UTC (permalink / raw)
  To: Artur Samborski; +Cc: kvm

https://bugzilla.redhat.com/show_bug.cgi?id=975065 ?

On Mon, Jul 29, 2013 at 11:02:01AM +0200, Artur Samborski wrote:
> Hello,
> 
> we have another problem with KVM on our production machines.
> 
> After updating the OS (Fedora Core 18) our KVM virtual machines
> started to crash. Test have shown that this crashes are associated
> with occurrence of a large load of network traffic.
> 
> When the virtual machine hangs, this message appears in the KVM-host
> kernel (3.9.9-201.fc18.x86_64) log:
> 
> 
>  BUG: unable to handle kernel NULL pointer dereference at           (null)
>  IP: [<ffffffff81141af1>] put_page+0x11/0x60
>  PGD 0
>  Oops: 0000 [#1] SMP
>  Modules linked in: binfmt_misc ip6table_filter ip6_tables
> ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
> xt_CHECKSUM iptable_mangle bridge stp llc be2iscsi iscsi_boot_sysfs
> bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser
> rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp
> libiscsi_tcp libiscsi scsi_transport_iscsi e1000e iTCO_wdt
> iTCO_vendor_support ptp pps_core vhost_net ses ioatdma dcdbas mperf
> shpchp i7core_edac lpc_ich edac_core dca mfd_core tun macvtap
> macvlan enclosure bnx2 coretemp crc32c_intel serio_raw microcode
> kvm_intel acpi_power_meter kvm ipmi_devintf ipmi_si ipmi_msghandler
> mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core megaraid_sas
> wmi
>  CPU 2
>  Pid: 7524, comm: vhost-7521 Tainted: G        W I
> 3.9.9-201.fc18.x86_64 #1 Dell Inc. PowerEdge R610/0K399H
>  RIP: 0010:[<ffffffff81141af1>]  [<ffffffff81141af1>] put_page+0x11/0x60
>  RSP: 0018:ffff880427a31c28  EFLAGS: 00010296
>  RAX: ffff88065d8e16c0 RBX: 0000000000000000 RCX: 0000000000000006
>  RDX: 0000000000000150 RSI: 0000000000000000 RDI: 0000000000000000
>  RBP: ffff880427a31c38 R08: 000000000000000a R09: 00000000000006f7
>  R10: 0000000000000000 R11: 00000000000006f6 R12: ffff8808273c9d00
>  R13: ffffffffa0180237 R14: ffff88067c8d43d8 R15: ffff8808273c9d00
>  FS:  0000000000000000(0000) GS:ffff88083fc20000(0000)
> knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>  CR2: 0000000000000000 CR3: 000000082864a000 CR4: 00000000000027e0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>  Process vhost-7521 (pid: 7524, threadinfo ffff880427a30000, task
> ffff880427f94650)
>  Stack:
>   ffffea001c159340 0000000000000013 ffff880427a31c58 ffffffff8154676f
>   ffff8808273c9d00 ffff8808273c9d00 ffff880427a31c78 ffffffff8154680e
>   ffffea001c15b080 ffff880828f45800 ffff880427a31ca8 ffffffff815468c6
>  Call Trace:
>   [<ffffffff8154676f>] skb_release_data+0x8f/0x110
>   [<ffffffff8154680e>] __kfree_skb+0x1e/0xa0
>   [<ffffffff815468c6>] kfree_skb+0x36/0xa0
>   [<ffffffffa0180237>] macvtap_get_user+0x317/0x510 [macvtap]
>   [<ffffffffa018045b>] macvtap_sendmsg+0x2b/0x30 [macvtap]
>   [<ffffffffa0258db7>] handle_tx+0x287/0x680 [vhost_net]
>   [<ffffffffa02591e5>] handle_tx_kick+0x15/0x20 [vhost_net]
>   [<ffffffffa025595d>] vhost_worker+0xed/0x190 [vhost_net]
>   [<ffffffffa0255870>] ? vhost_work_flush+0x110/0x110 [vhost_net]
>   [<ffffffff81082ba0>] kthread+0xc0/0xd0
>   [<ffffffff81010000>] ? ftrace_define_fields_xen_mc_flush+0x20/0xb0
>   [<ffffffff81082ae0>] ? kthread_create_on_node+0x120/0x120
>   [<ffffffff8166af2c>] ret_from_fork+0x7c/0xb0
>   [<ffffffff81082ae0>] ? kthread_create_on_node+0x120/0x120
>  Code: 45 fc 65 48 01 04 25 70 02 01 00 c9 c3 66 66 66 66 2e 0f 1f
> 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 53 48 89 fb 48 83 ec 08
> <48> f7 07 00 c0 00 00 75 34 8b 47 1c 85 c0 74 1a f0 ff 4b 1c 0f
>  RIP  [<ffffffff81141af1>] put_page+0x11/0x60
>   RSP <ffff880427a31c28>
>  CR2: 0000000000000000
>  ---[ end trace cb305c3097c1de97 ]---
> 
> 
> After returning to the previously working kernel (3.7.0 -- manually
> compiled from kvm git sources) - the problem still persists:
> 
> 
>  BUG: unable to handle kernel paging request at 0000040200000401
>  IP: [<ffffffff8113e445>] put_page+0x5/0x50
>  PGD 0
>  Oops: 0000 [#1] SMP
>  Modules linked in: binfmt_misc ip6table_filter ip6_tables
> ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
> nf_conntrack xt_CHECKSUM iptable_mangle be2iscsi iscsi_boot_sysfs
> bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser
> rdma_cm ib_addr iw_cm ib_cm ib_sa bridge stp llc ib_mad ib_core
> iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vhost_net
> coretemp e1000e ioatdma tun macvtap macvlan bnx2 iTCO_wdt shpchp
> crc32c_intel microcode dca ses iTCO_vendor_support lpc_ich wmi
> dcdbas kvm_intel i7core_edac edac_core enclosure joydev
> acpi_power_meter serio_raw pcspkr mfd_core kvm ipmi_devintf ipmi_si
> ipmi_msghandler megaraid_sas
>  CPU 0
>  Pid: 1505, comm: vhost-1502 Tainted: G        W    3.7.0HYDRA_02+
> #1 Dell Inc. PowerEdge R610/0K399H
>  RIP: 0010:[<ffffffff8113e445>]  [<ffffffff8113e445>] put_page+0x5/0x50
>  RSP: 0018:ffff880823e6bc50  EFLAGS: 00010202
>  RAX: ffff88066d34bec0 RBX: 0000000000000012 RCX: ffffea0019cf001c
>  RDX: 0000000000000140 RSI: 0000000000000246 RDI: 0000040200000401
>  RBP: ffff880823e6bc68 R08: ffff880823e444f8 R09: 0000000000000010
>  R10: 0000000000000000 R11: 00003ffffffff000 R12: ffff880827d34700
>  R13: ffffffffa01371a8 R14: ffff880823e443d8 R15: ffff880827d34700
>  FS:  0000000000000000(0000) GS:ffff88083fc00000(0000)
> knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>  CR2: 0000040200000401 CR3: 0000000825272000 CR4: 00000000000027e0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>  Process vhost-1502 (pid: 1505, threadinfo ffff880823e6a000, task
> ffff880825a1c5c0)
>  Stack:
>   ffffffff81520c1f ffff880827d34700 ffff880827d34700 ffff880823e6bc88
>   ffffffff81520cbe ffffea0019cf69c0 ffff88042a1df400 ffff880823e6bcb8
>   ffffffff81520d76 000000000000000c ffff88042a1df400 000000000000000c
>  Call Trace:
>   [<ffffffff81520c1f>] ? skb_release_data+0x8f/0x110
>   [<ffffffff81520cbe>] __kfree_skb+0x1e/0xa0
>   [<ffffffff81520d76>] kfree_skb+0x36/0xa0
>   [<ffffffffa01371a8>] macvtap_get_user+0x248/0x490 [macvtap]
>   [<ffffffffa013741b>] macvtap_sendmsg+0x2b/0x30 [macvtap]
>   [<ffffffffa0165d2a>] handle_tx+0x28a/0x680 [vhost_net]
>   [<ffffffffa0166155>] handle_tx_kick+0x15/0x20 [vhost_net]
>   [<ffffffffa016295d>] vhost_worker+0xed/0x190 [vhost_net]
>   [<ffffffffa0162870>] ? vhost_work_flush+0x110/0x110 [vhost_net]
>   [<ffffffff81081750>] kthread+0xc0/0xd0
>   [<ffffffff81010000>] ? ftrace_define_fields_xen_mc_entry+0x50/0xf0
>   [<ffffffff81081690>] ? kthread_create_on_node+0x120/0x120
>   [<ffffffff8163fdac>] ret_from_fork+0x7c/0xb0
>   [<ffffffff81081690>] ? kthread_create_on_node+0x120/0x120
>  Code: fc 00 00 00 00 e8 ac fe ff ff 48 63 45 fc 65 48 01 04 25 b8
> 06 01 00 c9 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
> <48> f7 07 00 c0 00 00 55 48 89 e5 75 2a 8b 47 1c 85 c0 74 1e f0
>  RIP  [<ffffffff8113e445>] put_page+0x5/0x50
>   RSP <ffff880823e6bc50>
>  CR2: 0000040200000401
> 
> 
> Only after a complete rollback to the previous state of the system -
> everything starts to work properly (the problem disappears).
> Therefore suspicion that it may be associated with same userspace
> tools?
> 
> I will be grateful for any hints.
> 
> Regards,
> Artur Samborski
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-07-29 10:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-29  9:02 VM started to hang after a system update Artur Samborski
2013-07-29 10:58 ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox