From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lai Jiangshan Subject: Re: [Qemu-devel] [PATCH 1/1 V5 tuning] kernel/kvm: introduce KVM_SET_LINT1 and fix improper nmi emulation Date: Sun, 16 Oct 2011 23:01:28 +0800 Message-ID: <4E9AF1C8.90102@cn.fujitsu.com> References: <20110913093835.GB4265@localhost.localdomain> <20110914093441.e2bb305c.kamezawa.hiroyu@jp.fujitsu.com> <4E705BC3.5000508@cn.fujitsu.com> <20110915164704.9cacd407.kamezawa.hiroyu@jp.fujitsu.com> <4E71B28F.7030201@cn.fujitsu.com> <4E72F3BA.2000603@jp.fujitsu.com> <4E73200A.7040908@jp.fujitsu.com> <4E76C6AA.9080403@cn.fujitsu.com> <4E7B04DC.1030407@cn.fujitsu.com> <4E7B4B8F.507@siemens.com> <4E7C51E4.2000503@cn.fujitsu.com> <4E7F3585.40108@redhat.com> <4E7F635E.6080009@web.de> <4E8035F9.9080908@redhat.com> <4E928B54.1070707@cn.fujitsu.com> <4E92958E.9000509@web.de> <4E9476E2.1070804@cn.fujitsu.com> <4E948842.4030406@web.de> <4E978827.6070008@cn.fujitsu.com> <4E97CE42.9020102@web.de> <4E97D85C.7070107@cn.fujitsu.com> <4E97DB62.9020605@web.de> <4E980621.9050301@cn.fujit su.com> <1318593576.2827.18.camel@sasha> <4E982600.3020204@siemens.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Sasha Levin , "kvm@vger.kernel.org" , Avi Kivity , "qemu-devel@nongnu.org" , KAMEZAWA Hiroyuki , Kenji Kaneshige To: Jan Kiszka Return-path: Received: from cn.fujitsu.com ([222.73.24.84]:55797 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1753515Ab1JPPAH (ORCPT ); Sun, 16 Oct 2011 11:00:07 -0400 In-Reply-To: <4E982600.3020204@siemens.com> Sender: kvm-owner@vger.kernel.org List-ID: On 10/14/2011 08:07 PM, Jan Kiszka wrote: > On 2011-10-14 13:59, Sasha Levin wrote: >> On Fri, 2011-10-14 at 17:51 +0800, Lai Jiangshan wrote: >>> Currently, NMI interrupt is blindly sent to all the vCPUs when NMI >>> button event happens. This doesn't properly emulate real hardware on >>> which NMI button event triggers LINT1. Because of this, NMI is sent to >>> the processor even when LINT1 is masked in LVT. For example, this >>> causes the problem that kdump initiated by NMI sometimes doesn't work >>> on KVM, because kdump assumes NMI is masked on CPUs other than CPU0. >>> >>> With this patch, we introduce introduce KVM_SET_LINT1, >>> and we can use KVM_SET_LINT1 to correctly emulate NMI button >>> without change the old KVM_NMI behavior. >>> >>> Signed-off-by: Lai Jiangshan >>> Reported-by: Kenji Kaneshige >>> --- >> >> It could use a documentation update as well. >> >>> arch/x86/kvm/irq.h | 1 + >>> arch/x86/kvm/lapic.c | 7 +++++++ >>> arch/x86/kvm/x86.c | 8 ++++++++ >>> include/linux/kvm.h | 3 +++ >>> 4 files changed, 19 insertions(+), 0 deletions(-) >>> diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h >>> index 53e2d08..0c96315 100644 >>> --- a/arch/x86/kvm/irq.h >>> +++ b/arch/x86/kvm/irq.h >>> @@ -95,6 +95,7 @@ void kvm_pic_reset(struct kvm_kpic_state *s); >>> void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu); >>> void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu); >>> void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu); >>> +void kvm_apic_lint1_deliver(struct kvm_vcpu *vcpu); >>> void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu); >>> void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu); >>> void __kvm_migrate_timers(struct kvm_vcpu *vcpu); >>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c >>> index 57dcbd4..87fe36a 100644 >>> --- a/arch/x86/kvm/lapic.c >>> +++ b/arch/x86/kvm/lapic.c >>> @@ -1039,6 +1039,13 @@ void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu) >>> kvm_apic_local_deliver(apic, APIC_LVT0); >>> } >>> >>> +void kvm_apic_lint1_deliver(struct kvm_vcpu *vcpu) >>> +{ >>> + struct kvm_lapic *apic = vcpu->arch.apic; >>> + >>> + kvm_apic_local_deliver(apic, APIC_LVT1); >>> +} >>> + >>> static struct kvm_timer_ops lapic_timer_ops = { >>> .is_periodic = lapic_is_periodic, >>> }; >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>> index 84a28ea..fccd094 100644 >>> --- a/arch/x86/kvm/x86.c >>> +++ b/arch/x86/kvm/x86.c >>> @@ -2077,6 +2077,7 @@ int kvm_dev_ioctl_check_extension(long ext) >>> case KVM_CAP_XSAVE: >>> case KVM_CAP_ASYNC_PF: >>> case KVM_CAP_GET_TSC_KHZ: >>> + case KVM_CAP_SET_LINT1: >>> r = 1; >>> break; >>> case KVM_CAP_COALESCED_MMIO: >>> @@ -3264,6 +3265,13 @@ long kvm_arch_vcpu_ioctl(struct file *filp, >>> >>> goto out; >>> } >>> + case KVM_SET_LINT1: { >>> + r = -EINVAL; >>> + if (!irqchip_in_kernel(vcpu->kvm)) >>> + goto out; >>> + r = 0; >>> + kvm_apic_lint1_deliver(vcpu); >> >> We simply ignore the return value of kvm_apic_local_deliver() and assume >> it always works. why? >> > > Hmm, I suddenly realized that we switched from enhancing the KVM_NMI > IOCTL to adding KVM_SET_LINT1 - what motivated this? Enhancing the KVM_NMI directly fixes the problem and matches the real hard ware more, but it changes API bahavior.(we preferred to this one) >>From the previous mails, I found you and Avi prefer to SET_LINT1 which keep old behavior and it is also OK for us. But I found it is hard to be implemented before, and I switched this one when you told me the clue. > > ( Maybe we should let the kernel part settle first before iterating > through user space changes. ) > Yes, you are right, we should settle the kernel-site at first, But I need you and Avi's suggestions. Thanks, Lai From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:51156) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RFTHg-000116-Iu for qemu-devel@nongnu.org; Sun, 16 Oct 2011 12:09:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RFTHf-0005ji-C7 for qemu-devel@nongnu.org; Sun, 16 Oct 2011 12:09:56 -0400 Received: from [222.73.24.84] (port=56498 helo=song.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RFTHd-0005jO-P3 for qemu-devel@nongnu.org; Sun, 16 Oct 2011 12:09:55 -0400 Message-ID: <4E9AF1C8.90102@cn.fujitsu.com> Date: Sun, 16 Oct 2011 23:01:28 +0800 From: Lai Jiangshan MIME-Version: 1.0 References: <20110913093835.GB4265@localhost.localdomain> <20110914093441.e2bb305c.kamezawa.hiroyu@jp.fujitsu.com> <4E705BC3.5000508@cn.fujitsu.com> <20110915164704.9cacd407.kamezawa.hiroyu@jp.fujitsu.com> <4E71B28F.7030201@cn.fujitsu.com> <4E72F3BA.2000603@jp.fujitsu.com> <4E73200A.7040908@jp.fujitsu.com> <4E76C6AA.9080403@cn.fujitsu.com> <4E7B04DC.1030407@cn.fujitsu.com> <4E7B4B8F.507@siemens.com> <4E7C51E4.2000503@cn.fujitsu.com> <4E7F3585.40108@redhat.com> <4E7F635E.6080009@web.de> <4E8035F9.9080908@redhat.com> <4E928B54.1070707@cn.fujitsu.com> <4E92958E.9000509@web.de> <4E9476E2.1070804@cn.fujitsu.com> <4E948842.4030406@web.de> <4E978827.6070008@cn.fujitsu.com> <4E97CE42.9020102@web.de> <4E97D85C.7070107@cn.fujitsu.com> <4E97DB62.9020605@web.de> <4E980621.9050301@cn.fujitsu.com> <1318593576.2827.18.camel@sasha> <4E982600.3020204@siemens.com> In-Reply-To: <4E982600.3020204@siemens.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=UTF-8 Subject: Re: [Qemu-devel] [PATCH 1/1 V5 tuning] kernel/kvm: introduce KVM_SET_LINT1 and fix improper nmi emulation List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: "kvm@vger.kernel.org" , "qemu-devel@nongnu.org" , Sasha Levin , Kenji Kaneshige , KAMEZAWA Hiroyuki , Avi Kivity On 10/14/2011 08:07 PM, Jan Kiszka wrote: > On 2011-10-14 13:59, Sasha Levin wrote: >> On Fri, 2011-10-14 at 17:51 +0800, Lai Jiangshan wrote: >>> Currently, NMI interrupt is blindly sent to all the vCPUs when NMI >>> button event happens. This doesn't properly emulate real hardware on >>> which NMI button event triggers LINT1. Because of this, NMI is sent to >>> the processor even when LINT1 is masked in LVT. For example, this >>> causes the problem that kdump initiated by NMI sometimes doesn't work >>> on KVM, because kdump assumes NMI is masked on CPUs other than CPU0. >>> >>> With this patch, we introduce introduce KVM_SET_LINT1, >>> and we can use KVM_SET_LINT1 to correctly emulate NMI button >>> without change the old KVM_NMI behavior. >>> >>> Signed-off-by: Lai Jiangshan >>> Reported-by: Kenji Kaneshige >>> --- >> >> It could use a documentation update as well. >> >>> arch/x86/kvm/irq.h | 1 + >>> arch/x86/kvm/lapic.c | 7 +++++++ >>> arch/x86/kvm/x86.c | 8 ++++++++ >>> include/linux/kvm.h | 3 +++ >>> 4 files changed, 19 insertions(+), 0 deletions(-) >>> diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h >>> index 53e2d08..0c96315 100644 >>> --- a/arch/x86/kvm/irq.h >>> +++ b/arch/x86/kvm/irq.h >>> @@ -95,6 +95,7 @@ void kvm_pic_reset(struct kvm_kpic_state *s); >>> void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu); >>> void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu); >>> void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu); >>> +void kvm_apic_lint1_deliver(struct kvm_vcpu *vcpu); >>> void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu); >>> void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu); >>> void __kvm_migrate_timers(struct kvm_vcpu *vcpu); >>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c >>> index 57dcbd4..87fe36a 100644 >>> --- a/arch/x86/kvm/lapic.c >>> +++ b/arch/x86/kvm/lapic.c >>> @@ -1039,6 +1039,13 @@ void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu) >>> kvm_apic_local_deliver(apic, APIC_LVT0); >>> } >>> >>> +void kvm_apic_lint1_deliver(struct kvm_vcpu *vcpu) >>> +{ >>> + struct kvm_lapic *apic = vcpu->arch.apic; >>> + >>> + kvm_apic_local_deliver(apic, APIC_LVT1); >>> +} >>> + >>> static struct kvm_timer_ops lapic_timer_ops = { >>> .is_periodic = lapic_is_periodic, >>> }; >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>> index 84a28ea..fccd094 100644 >>> --- a/arch/x86/kvm/x86.c >>> +++ b/arch/x86/kvm/x86.c >>> @@ -2077,6 +2077,7 @@ int kvm_dev_ioctl_check_extension(long ext) >>> case KVM_CAP_XSAVE: >>> case KVM_CAP_ASYNC_PF: >>> case KVM_CAP_GET_TSC_KHZ: >>> + case KVM_CAP_SET_LINT1: >>> r = 1; >>> break; >>> case KVM_CAP_COALESCED_MMIO: >>> @@ -3264,6 +3265,13 @@ long kvm_arch_vcpu_ioctl(struct file *filp, >>> >>> goto out; >>> } >>> + case KVM_SET_LINT1: { >>> + r = -EINVAL; >>> + if (!irqchip_in_kernel(vcpu->kvm)) >>> + goto out; >>> + r = 0; >>> + kvm_apic_lint1_deliver(vcpu); >> >> We simply ignore the return value of kvm_apic_local_deliver() and assume >> it always works. why? >> > > Hmm, I suddenly realized that we switched from enhancing the KVM_NMI > IOCTL to adding KVM_SET_LINT1 - what motivated this? Enhancing the KVM_NMI directly fixes the problem and matches the real hard ware more, but it changes API bahavior.(we preferred to this one) >>From the previous mails, I found you and Avi prefer to SET_LINT1 which keep old behavior and it is also OK for us. But I found it is hard to be implemented before, and I switched this one when you told me the clue. > > ( Maybe we should let the kernel part settle first before iterating > through user space changes. ) > Yes, you are right, we should settle the kernel-site at first, But I need you and Avi's suggestions. Thanks, Lai