From mboxrd@z Thu Jan 1 00:00:00 1970 From: Philipp Hahn Subject: RFH: kvm-0.14.1 on 2.6.32.42 stuck in flash_tlb_others_ipi() Date: Mon, 9 Jan 2012 12:41:41 +0100 Message-ID: <201201091241.45281.hahn@univention.de> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart11559267.y3LrOlOla1"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit To: kvm Return-path: Received: from mail.univention.de ([82.198.197.8]:2874 "EHLO mail.univention.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753685Ab2AILvk (ORCPT ); Mon, 9 Jan 2012 06:51:40 -0500 Received: from localhost (localhost [127.0.0.1]) by slugis.knut.univention.de (Postfix) with ESMTP id 299036EA181 for ; Mon, 9 Jan 2012 12:40:00 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by slugis.knut.univention.de (Postfix) with ESMTP id 1A3366EA201 for ; Mon, 9 Jan 2012 12:40:00 +0100 (CET) Received: from mail.univention.de ([127.0.0.1]) by localhost (slugis.knut.univention.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Teb-oGK5mKG9 for ; Mon, 9 Jan 2012 12:39:54 +0100 (CET) Received: from stave.knut.univention.de (stave.knut.univention.de [192.168.0.191]) by slugis.knut.univention.de (Postfix) with ESMTPSA id B64E16EA181 for ; Mon, 9 Jan 2012 12:39:54 +0100 (CET) Sender: kvm-owner@vger.kernel.org List-ID: --nextPart11559267.y3LrOlOla1 Content-Type: multipart/mixed; boundary="Boundary-01=_1JtCPoj7t3iX4a2" Content-Transfer-Encoding: 7bit Content-Disposition: inline --Boundary-01=_1JtCPoj7t3iX4a2 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Hello, one of our VMs regularly get stuck: the VM is completely unresponsive (no s= sh,=20 no serial console, no VNC). Using "gdbserver" and a remote system to debug= =20 the running VM, I see 3 CPUs (1,3,4) stuck in pgd_alloc() =E2=86=92 spin_lock_irqsave(pgd_lock) while the 4th CPU (2) is waiting in pgd_alloc() =E2=86=92 pgd_prepopulate_pmb() =E2=86=92... =E2=86=92 flush_= tlb_others_ipi() 195 while (!cpumask_empty(to_cpumask(f->flush_cpumask))) 196 cpu_relax(); (gdb) print f->flush_cpumask $5 =3D {1} CPU 1 is duing a do_exec() syscall, will CPU 2-4 are doing a do_fork() sysc= all=20 according to "thread apply all backtrace". After a "set variable f->flush_cpumask 0" from gdb the kernel continued=20 dumping the trace-informations, which I attached. Host: Debian linux-2.6.32-38-amd64 (=3D2.6.32.42), 8 Cores Kvm: 0.14.1+dfsg Guest: Debian linux-2.6.32-38-i686-bigmem, 4 CPUs Is this a known bug and/or is a fix available? I can gather more information from the VM if needd. Sincerely Philipp =2D-=20 Philipp Hahn Open Source Software Engineer hahn@univention.de Univention GmbH Linux for Your Business fon: +49 421 22 232- 0 Mary-Somerville-Str.1 D-28359 Bremen fax: +49 421 22 232-99 http://www.univention.de/ --Boundary-01=_1JtCPoj7t3iX4a2 Content-Type: text/plain; name="dmesg.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="dmesg.txt" [261850.404057] BUG: soft lockup - CPU#1 stuck for 219010s! [sshd:2707] [261850.404057] Modules linked in: binfmt_misc nfsd exportfs xt_tcpudp nf_conntrack_ipv6 ip6table_mangle ip6table_filter ip6_tables xt_state iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables nfs lockd fscache nfs_acl auth_rpcgss sunrpc quota_v2 quota_tree sd_mod crc_t10dif ide_generic ide_gd_mod ide_core cirrusfb 8139cp mii snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 psmouse i2c_core i6300esb virtio_balloon joydev serio_raw pcspkr processor evdev usbhid hid ext3 jbd dm_snapshot dm_mirror dm_region_hash dm_log dm_mod uhci_hcd ata_generic virtio_blk virtio_net ehci_hcd ata_piix thermal usbcore nls_base virtio_pci floppy thermal_sys libata button [last unloaded: scsi_wait_scan] [261850.404057] [261850.404057] Pid: 2707, comm: sshd Not tainted (2.6.32-ucs52-686-bigmem #1) Bochs [261850.404057] EIP: 0060:[] EFLAGS: 00000292 CPU: 1 [261850.404057] EIP is at _spin_unlock_irqrestore+0x9/0xf [261850.404057] EAX: 00000292 EBX: 00000003 ECX: 00000800 EDX: 00000292 [261850.404057] ESI: 00000000 EDI: 00000001 EBP: f55f3eb4 ESP: f55f3e64 [261850.404057] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [261850.404057] CR0: 8005003b CR2: b72a1000 CR3: 35607000 CR4: 000006f0 [261850.404057] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [261850.404057] DR6: ffff0ff0 DR7: 00000400 [261850.404057] Call Trace: [261850.404057] [] ? pgd_alloc+0x1c7/0x21a [261850.404057] [] mm_init+0xa8/0xd4 [261850.404057] [] ? dup_mm+0x6a/0x389 [261850.404057] [] ? copy_process+0x57a/0xf28 [261850.404057] [] ? copy_process+0x91b/0xf28 [261850.404057] [] ? do_fork+0x13a/0x2bc [261850.404057] [] ? sys_clone+0x21/0x27 [261850.404057] [] ? syscall_call+0x7/0xb [261850.404058] BUG: soft lockup - CPU#2 stuck for 218991s! [runsv:1236] [261850.404058] Modules linked in: binfmt_misc nfsd exportfs xt_tcpudp nf_conntrack_ipv6 ip6table_mangle ip6table_filter ip6_tables xt_state iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables nfs lockd fscache nfs_acl auth_rpcgss sunrpc quota_v2 quota_tree sd_mod crc_t10dif ide_generic ide_gd_mod ide_core cirrusfb 8139cp mii snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 psmouse i2c_core i6300esb virtio_balloon joydev serio_raw pcspkr processor evdev usbhid hid ext3 jbd dm_snapshot dm_mirror dm_region_hash dm_log dm_mod uhci_hcd ata_generic virtio_blk virtio_net ehci_hcd ata_piix thermal usbcore nls_base virtio_pci floppy thermal_sys libata button [last unloaded: scsi_wait_scan] [261850.404058] [261850.404058] Pid: 1236, comm: runsv Not tainted (2.6.32-ucs52-686-bigmem #1) Bochs [261850.404058] EIP: 0060:[] EFLAGS: 00000296 CPU: 2 [261850.404058] EIP is at _spin_unlock_irqrestore+0x9/0xf [261850.404058] EAX: 00000296 EBX: 00000003 ECX: 00000003 EDX: 00000296 [261850.404058] ESI: 00000000 EDI: 00000001 EBP: f602febc ESP: f602fe6c [261850.404058] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [261850.404058] CR0: 8005003b CR2: 0810acf4 CR3: 36a91000 CR4: 000006f0 [261850.404058] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [261850.404058] DR6: ffff0ff0 DR7: 00000400 [261850.404058] Call Trace: [261850.404058] [] ? pgd_alloc+0x1c7/0x21a [261850.404058] [] mm_init+0xa8/0xd4 [261850.404058] [] ? dup_mm+0x6a/0x389 [261850.404058] [] ? copy_process+0x57a/0xf28 [261850.404058] [] ? copy_process+0x91b/0xf28 [261850.404058] [] ? do_fork+0x13a/0x2bc [261850.404058] [] ? do_sync_read+0x0/0x107 [261850.404058] [] ? vfs_read+0x7b/0xd3 [261850.404058] [] ? sys_fork+0x15/0x19 [261850.404058] [] ? syscall_call+0x7/0xb [261850.438247] BUG: soft lockup - CPU#0 stuck for 218999s! [pbuilder-update:2713] [261850.438247] Modules linked in: binfmt_misc nfsd exportfs xt_tcpudp nf_conntrack_ipv6 ip6table_mangle ip6table_filter ip6_tables xt_state iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables nfs lockd fscache nfs_acl auth_rpcgss sunrpc quota_v2 quota_tree sd_mod crc_t10dif ide_generic ide_gd_mod ide_core cirrusfb 8139cp mii snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 psmouse i2c_core i6300esb virtio_balloon joydev serio_raw pcspkr processor evdev usbhid hid ext3 jbd dm_snapshot dm_mirror dm_region_hash dm_log dm_mod uhci_hcd ata_generic virtio_blk virtio_net ehci_hcd ata_piix thermal usbcore nls_base virtio_pci floppy thermal_sys libata button [last unloaded: scsi_wait_scan] [261850.438247] [261850.438247] Pid: 2713, comm: pbuilder-update Not tainted (2.6.32-ucs52-686-bigmem #1) Bochs [261850.438247] EIP: 0060:[] EFLAGS: 00200292 CPU: 0 [261850.438247] EIP is at _spin_unlock_irqrestore+0x9/0xf [261850.438247] EAX: 00200292 EBX: 00000003 ECX: 00000000 EDX: 00200292 [261850.438247] ESI: 00000000 EDI: 00000001 EBP: f54ddf4c ESP: f54ddefc [261850.438247] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [261850.438247] CR0: 8005003b CR2: b73611a0 CR3: 35683000 CR4: 000006f0 [261850.438247] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [261850.438247] DR6: ffff0ff0 DR7: 00000400 [261850.438247] Call Trace: [261850.438247] [] ? pgd_alloc+0x1c7/0x21a [261850.438247] [] ? _cond_resched+0x25/0x3c [261850.438247] [] ? wait_for_common+0x20/0x100 [261850.438247] [] mm_init+0xa8/0xd4 [261850.438247] [] ? bprm_mm_init+0x14/0x163 [261850.438247] [] ? do_execve+0xe3/0x25b [261850.438247] [] ? sys_execve+0x23/0x47 [261850.438247] [] ? syscall_call+0x7/0xb [261850.483139] BUG: soft lockup - CPU#3 stuck for 218991s! [nrpe:1254] [261850.483139] Modules linked in: binfmt_misc nfsd exportfs xt_tcpudp nf_conntrack_ipv6 ip6table_mangle ip6table_filter ip6_tables xt_state iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables nfs lockd fscache nfs_acl auth_rpcgss sunrpc quota_v2 quota_tree sd_mod crc_t10dif ide_generic ide_gd_mod ide_core cirrusfb 8139cp mii snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 psmouse i2c_core i6300esb virtio_balloon joydev serio_raw pcspkr processor evdev usbhid hid ext3 jbd dm_snapshot dm_mirror dm_region_hash dm_log dm_mod uhci_hcd ata_generic virtio_blk virtio_net ehci_hcd ata_piix thermal usbcore nls_base virtio_pci floppy thermal_sys libata button [last unloaded: scsi_wait_scan] [261850.483139] [261850.483139] Pid: 1254, comm: nrpe Not tainted (2.6.32-ucs52-686-bigmem #1) Bochs [261850.483139] EIP: 0060:[] EFLAGS: 00200292 CPU: 3 [261850.483139] EIP is at _spin_unlock_irqrestore+0x9/0xf [261850.483139] EAX: 00200292 EBX: 00000003 ECX: 00000004 EDX: 00200292 [261850.483139] ESI: 00000000 EDI: 00000001 EBP: f647feb4 ESP: f647fe64 [261850.483139] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [261850.483139] CR0: 8005003b CR2: b74ee580 CR3: 36010000 CR4: 000006f0 [261850.483139] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [261850.483139] DR6: ffff0ff0 DR7: 00000400 [261850.483139] Call Trace: [261850.483139] [] ? pgd_alloc+0x1c7/0x21a [261850.483139] [] mm_init+0xa8/0xd4 [261850.483139] [] ? dup_mm+0x6a/0x389 [261850.483139] [] ? copy_process+0x57a/0xf28 [261850.483139] [] ? copy_process+0x91b/0xf28 [261850.483139] [] ? do_fork+0x13a/0x2bc [261850.483139] [] ? copy_to_user+0x29/0xf8 [261850.483139] [] ? sys_clone+0x21/0x27 [261850.483139] [] ? syscall_call+0x7/0xb [261852.541339] nfs: RPC call returned error 88 --Boundary-01=_1JtCPoj7t3iX4a2-- --nextPart11559267.y3LrOlOla1 Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEABECAAYFAk8K0nUACgkQYPlgoZpUDjlmWwCfdbSWA88bqojV91TFPAgVSJ9A ztkAnRo2IdK9f8q5MtBeR2bBnk1BKAb1 =2nHd -----END PGP SIGNATURE----- --nextPart11559267.y3LrOlOla1--