From mboxrd@z Thu Jan  1 00:00:00 1970
From: zhanghailiang <zhang.zhanghailiang@huawei.com>
Subject: [BUG/RFC] Two cpus are not brought up normally in SLES11 sp3 VM after
 reboot
Date: Mon, 6 Jul 2015 15:54:20 +0800
Message-ID: <559A342C.6020207@huawei.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8;
	format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: <peter.huangpeng@huawei.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
To: <kvm@vger.kernel.org>
Return-path: <kvm-owner@vger.kernel.org>
Received: from szxga01-in.huawei.com ([58.251.152.64]:59417 "EHLO
	szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753018AbbGFHye (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 6 Jul 2015 03:54:34 -0400
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Hi,

Recently we encountered a problem in our project: 2 CPUs in VM are not =
brought up normally after reboot.

Our host is using KVM kmod 3.6 and QEMU 2.1.
A SLES 11 sp3 VM configured with 8 vcpus,
cpu model is configured with 'host-passthrough'.

After VM's first time started up, everything seems to be OK.
and then VM is paniced and rebooted.
After reboot, only 6 cpus are brought up in VM, cpu1 and cpu7 are not o=
nline.

This is the only message we can get from VM:
VM dmesg shows:
[    0.069867] Booting Node   0, Processors  #1
[    5.060042] CPU1: Stuck ??
[    5.060499]  #2
[    5.088322] kvm-clock: cpu 2, msr 6:3fc90901, secondary cpu clock
[    5.088335] KVM setup async PF for cpu 2
[    5.092967] NMI watchdog enabled, takes one hw-pmu counter.
[    5.094405]  #3
[    5.108324] kvm-clock: cpu 3, msr 6:3fcd0901, secondary cpu clock
[    5.108333] KVM setup async PF for cpu 3
[    5.113553] NMI watchdog enabled, takes one hw-pmu counter.
[    5.114970]  #4
[    5.128325] kvm-clock: cpu 4, msr 6:3fd10901, secondary cpu clock
[    5.128336] KVM setup async PF for cpu 4
[    5.134576] NMI watchdog enabled, takes one hw-pmu counter.
[    5.135998]  #5
[    5.152324] kvm-clock: cpu 5, msr 6:3fd50901, secondary cpu clock
[    5.152334] KVM setup async PF for cpu 5
[    5.154764] NMI watchdog enabled, takes one hw-pmu counter.
[    5.156467]  #6
[    5.172327] kvm-clock: cpu 6, msr 6:3fd90901, secondary cpu clock
[    5.172341] KVM setup async PF for cpu 6
[    5.180738] NMI watchdog enabled, takes one hw-pmu counter.
[    5.182173]  #7 Ok.
[   10.170815] CPU7: Stuck ??
[   10.171648] Brought up 6 CPUs
[   10.172394] Total of 6 processors activated (28799.97 BogoMIPS).

 From host, we found that QEMU vcpu1 thread and vcpu7 thread were not c=
onsuming any cpu (Should be in idle state),
All of VCPUs' stacks in host is like bellow:

[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
[<ffffffffffffffff>] 0xffffffffffffffff

We looked into the kernel codes that could leading to the above 'Stuck'=
 warning,
and found that the only possible is the emulation of 'cpuid' instruct i=
n kvm/qemu has something wrong.
But since we can=E2=80=99t reproduce this problem, we are not quite sur=
e.
Is there any possible that the cupid emulation in kvm/qemu has some bug=
 ?

Has anyone come across these problem before? Or any idea?

Thanks,
zhanghailiang




From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:39685)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1ZC1Eh-0007YI-TJ
	for qemu-devel@nongnu.org; Mon, 06 Jul 2015 03:54:45 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1ZC1Ec-0000Fw-TG
	for qemu-devel@nongnu.org; Mon, 06 Jul 2015 03:54:43 -0400
Received: from szxga01-in.huawei.com ([58.251.152.64]:16278)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1ZC1Ec-0000Dx-AG
	for qemu-devel@nongnu.org; Mon, 06 Jul 2015 03:54:38 -0400
Message-ID: <559A342C.6020207@huawei.com>
Date: Mon, 6 Jul 2015 15:54:20 +0800
From: zhanghailiang <zhang.zhanghailiang@huawei.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: 8bit
Subject: [Qemu-devel] [BUG/RFC] Two cpus are not brought up normally in
 SLES11 sp3 VM after reboot
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: kvm@vger.kernel.org
Cc: peter.huangpeng@huawei.com, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>

Hi,

Recently we encountered a problem in our project: 2 CPUs in VM are not brought up normally after reboot.

Our host is using KVM kmod 3.6 and QEMU 2.1.
A SLES 11 sp3 VM configured with 8 vcpus,
cpu model is configured with 'host-passthrough'.

After VM's first time started up, everything seems to be OK.
and then VM is paniced and rebooted.
After reboot, only 6 cpus are brought up in VM, cpu1 and cpu7 are not online.

This is the only message we can get from VM:
VM dmesg shows:
[    0.069867] Booting Node   0, Processors  #1
[    5.060042] CPU1: Stuck ??
[    5.060499]  #2
[    5.088322] kvm-clock: cpu 2, msr 6:3fc90901, secondary cpu clock
[    5.088335] KVM setup async PF for cpu 2
[    5.092967] NMI watchdog enabled, takes one hw-pmu counter.
[    5.094405]  #3
[    5.108324] kvm-clock: cpu 3, msr 6:3fcd0901, secondary cpu clock
[    5.108333] KVM setup async PF for cpu 3
[    5.113553] NMI watchdog enabled, takes one hw-pmu counter.
[    5.114970]  #4
[    5.128325] kvm-clock: cpu 4, msr 6:3fd10901, secondary cpu clock
[    5.128336] KVM setup async PF for cpu 4
[    5.134576] NMI watchdog enabled, takes one hw-pmu counter.
[    5.135998]  #5
[    5.152324] kvm-clock: cpu 5, msr 6:3fd50901, secondary cpu clock
[    5.152334] KVM setup async PF for cpu 5
[    5.154764] NMI watchdog enabled, takes one hw-pmu counter.
[    5.156467]  #6
[    5.172327] kvm-clock: cpu 6, msr 6:3fd90901, secondary cpu clock
[    5.172341] KVM setup async PF for cpu 6
[    5.180738] NMI watchdog enabled, takes one hw-pmu counter.
[    5.182173]  #7 Ok.
[   10.170815] CPU7: Stuck ??
[   10.171648] Brought up 6 CPUs
[   10.172394] Total of 6 processors activated (28799.97 BogoMIPS).

 From host, we found that QEMU vcpu1 thread and vcpu7 thread were not consuming any cpu (Should be in idle state),
All of VCPUs' stacks in host is like bellow:

[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
[<ffffffffffffffff>] 0xffffffffffffffff

We looked into the kernel codes that could leading to the above 'Stuck' warning,
and found that the only possible is the emulation of 'cpuid' instruct in kvm/qemu has something wrong.
But since we can’t reproduce this problem, we are not quite sure.
Is there any possible that the cupid emulation in kvm/qemu has some bug ?

Has anyone come across these problem before? Or any idea?

Thanks,
zhanghailiang