From mboxrd@z Thu Jan 1 00:00:00 1970 From: zhanghailiang Subject: [BUG/RFC] Two cpus are not brought up normally in SLES11 sp3 VM after reboot Date: Mon, 6 Jul 2015 15:54:20 +0800 Message-ID: <559A342C.6020207@huawei.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: , "qemu-devel@nongnu.org" To: Return-path: Received: from szxga01-in.huawei.com ([58.251.152.64]:59417 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753018AbbGFHye (ORCPT ); Mon, 6 Jul 2015 03:54:34 -0400 Sender: kvm-owner@vger.kernel.org List-ID: Hi, Recently we encountered a problem in our project: 2 CPUs in VM are not = brought up normally after reboot. Our host is using KVM kmod 3.6 and QEMU 2.1. A SLES 11 sp3 VM configured with 8 vcpus, cpu model is configured with 'host-passthrough'. After VM's first time started up, everything seems to be OK. and then VM is paniced and rebooted. After reboot, only 6 cpus are brought up in VM, cpu1 and cpu7 are not o= nline. This is the only message we can get from VM: VM dmesg shows: [ 0.069867] Booting Node 0, Processors #1 [ 5.060042] CPU1: Stuck ?? [ 5.060499] #2 [ 5.088322] kvm-clock: cpu 2, msr 6:3fc90901, secondary cpu clock [ 5.088335] KVM setup async PF for cpu 2 [ 5.092967] NMI watchdog enabled, takes one hw-pmu counter. [ 5.094405] #3 [ 5.108324] kvm-clock: cpu 3, msr 6:3fcd0901, secondary cpu clock [ 5.108333] KVM setup async PF for cpu 3 [ 5.113553] NMI watchdog enabled, takes one hw-pmu counter. [ 5.114970] #4 [ 5.128325] kvm-clock: cpu 4, msr 6:3fd10901, secondary cpu clock [ 5.128336] KVM setup async PF for cpu 4 [ 5.134576] NMI watchdog enabled, takes one hw-pmu counter. [ 5.135998] #5 [ 5.152324] kvm-clock: cpu 5, msr 6:3fd50901, secondary cpu clock [ 5.152334] KVM setup async PF for cpu 5 [ 5.154764] NMI watchdog enabled, takes one hw-pmu counter. [ 5.156467] #6 [ 5.172327] kvm-clock: cpu 6, msr 6:3fd90901, secondary cpu clock [ 5.172341] KVM setup async PF for cpu 6 [ 5.180738] NMI watchdog enabled, takes one hw-pmu counter. [ 5.182173] #7 Ok. [ 10.170815] CPU7: Stuck ?? [ 10.171648] Brought up 6 CPUs [ 10.172394] Total of 6 processors activated (28799.97 BogoMIPS). From host, we found that QEMU vcpu1 thread and vcpu7 thread were not c= onsuming any cpu (Should be in idle state), All of VCPUs' stacks in host is like bellow: [] kvm_vcpu_block+0x65/0xa0 [kvm] [] __vcpu_run+0xd1/0x260 [kvm] [] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] [] kvm_vcpu_ioctl+0x38e/0x580 [kvm] [] do_vfs_ioctl+0x8b/0x3b0 [] sys_ioctl+0xa1/0xb0 [] system_call_fastpath+0x16/0x1b [<00002ab9fe1f99a7>] 0x2ab9fe1f99a7 [] 0xffffffffffffffff We looked into the kernel codes that could leading to the above 'Stuck'= warning, and found that the only possible is the emulation of 'cpuid' instruct i= n kvm/qemu has something wrong. But since we can=E2=80=99t reproduce this problem, we are not quite sur= e. Is there any possible that the cupid emulation in kvm/qemu has some bug= ? Has anyone come across these problem before? Or any idea? Thanks, zhanghailiang From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39685) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZC1Eh-0007YI-TJ for qemu-devel@nongnu.org; Mon, 06 Jul 2015 03:54:45 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZC1Ec-0000Fw-TG for qemu-devel@nongnu.org; Mon, 06 Jul 2015 03:54:43 -0400 Received: from szxga01-in.huawei.com ([58.251.152.64]:16278) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZC1Ec-0000Dx-AG for qemu-devel@nongnu.org; Mon, 06 Jul 2015 03:54:38 -0400 Message-ID: <559A342C.6020207@huawei.com> Date: Mon, 6 Jul 2015 15:54:20 +0800 From: zhanghailiang MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit Subject: [Qemu-devel] [BUG/RFC] Two cpus are not brought up normally in SLES11 sp3 VM after reboot List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: kvm@vger.kernel.org Cc: peter.huangpeng@huawei.com, "qemu-devel@nongnu.org" Hi, Recently we encountered a problem in our project: 2 CPUs in VM are not brought up normally after reboot. Our host is using KVM kmod 3.6 and QEMU 2.1. A SLES 11 sp3 VM configured with 8 vcpus, cpu model is configured with 'host-passthrough'. After VM's first time started up, everything seems to be OK. and then VM is paniced and rebooted. After reboot, only 6 cpus are brought up in VM, cpu1 and cpu7 are not online. This is the only message we can get from VM: VM dmesg shows: [ 0.069867] Booting Node 0, Processors #1 [ 5.060042] CPU1: Stuck ?? [ 5.060499] #2 [ 5.088322] kvm-clock: cpu 2, msr 6:3fc90901, secondary cpu clock [ 5.088335] KVM setup async PF for cpu 2 [ 5.092967] NMI watchdog enabled, takes one hw-pmu counter. [ 5.094405] #3 [ 5.108324] kvm-clock: cpu 3, msr 6:3fcd0901, secondary cpu clock [ 5.108333] KVM setup async PF for cpu 3 [ 5.113553] NMI watchdog enabled, takes one hw-pmu counter. [ 5.114970] #4 [ 5.128325] kvm-clock: cpu 4, msr 6:3fd10901, secondary cpu clock [ 5.128336] KVM setup async PF for cpu 4 [ 5.134576] NMI watchdog enabled, takes one hw-pmu counter. [ 5.135998] #5 [ 5.152324] kvm-clock: cpu 5, msr 6:3fd50901, secondary cpu clock [ 5.152334] KVM setup async PF for cpu 5 [ 5.154764] NMI watchdog enabled, takes one hw-pmu counter. [ 5.156467] #6 [ 5.172327] kvm-clock: cpu 6, msr 6:3fd90901, secondary cpu clock [ 5.172341] KVM setup async PF for cpu 6 [ 5.180738] NMI watchdog enabled, takes one hw-pmu counter. [ 5.182173] #7 Ok. [ 10.170815] CPU7: Stuck ?? [ 10.171648] Brought up 6 CPUs [ 10.172394] Total of 6 processors activated (28799.97 BogoMIPS). From host, we found that QEMU vcpu1 thread and vcpu7 thread were not consuming any cpu (Should be in idle state), All of VCPUs' stacks in host is like bellow: [] kvm_vcpu_block+0x65/0xa0 [kvm] [] __vcpu_run+0xd1/0x260 [kvm] [] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] [] kvm_vcpu_ioctl+0x38e/0x580 [kvm] [] do_vfs_ioctl+0x8b/0x3b0 [] sys_ioctl+0xa1/0xb0 [] system_call_fastpath+0x16/0x1b [<00002ab9fe1f99a7>] 0x2ab9fe1f99a7 [] 0xffffffffffffffff We looked into the kernel codes that could leading to the above 'Stuck' warning, and found that the only possible is the emulation of 'cpuid' instruct in kvm/qemu has something wrong. But since we can’t reproduce this problem, we are not quite sure. Is there any possible that the cupid emulation in kvm/qemu has some bug ? Has anyone come across these problem before? Or any idea? Thanks, zhanghailiang