From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751552AbeAYOQf (ORCPT ); Thu, 25 Jan 2018 09:16:35 -0500 Received: from mx1.redhat.com ([209.132.183.28]:59314 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751388AbeAYOQe (ORCPT ); Thu, 25 Jan 2018 09:16:34 -0500 Date: Thu, 25 Jan 2018 15:16:21 +0100 From: Radim =?utf-8?B?S3LEjW3DocWZ?= To: Liran Alon Cc: vkuznets@redhat.com, x86@kernel.org, pbonzini@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, "Michael S. Tsirkin" , Jason Wang Subject: Re: [PATCH] x86/kvm: disable fast MMIO when running nested Message-ID: <20180125141620.GA7663@flask> References: <6690c53c-fc99-44ea-9090-6e7438c1bc98@default> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6690c53c-fc99-44ea-9090-6e7438c1bc98@default> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2018-01-25 01:55-0800, Liran Alon: > ----- vkuznets@redhat.com wrote: > > I was investigating an issue with seabios >= 1.10 which stopped > > working > > for nested KVM on Hyper-V. The problem appears to be in > > handle_ept_violation() function: when we do fast mmio we need to skip > > the instruction so we do kvm_skip_emulated_instruction(). This, > > however, > > depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS. > > However, this is not the case. > > > > Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when > > EPT MISCONFIG occurs. While on real hardware it was observed to be > > set, > > some hypervisors follow the spec and don't set it; we end up > > advancing > > IP with some random value. > > > > I checked with Microsoft and they confirmed they don't fill > > VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG. > > > > Fix the issue by disabling fast mmio when running nested. > > > > Signed-off-by: Vitaly Kuznetsov > > --- > > arch/x86/kvm/vmx.c | 9 ++++++++- > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > > index c829d89e2e63..54afb446f38e 100644 > > --- a/arch/x86/kvm/vmx.c > > +++ b/arch/x86/kvm/vmx.c > > @@ -6558,9 +6558,16 @@ static int handle_ept_misconfig(struct kvm_vcpu > > *vcpu) > > /* > > * A nested guest cannot optimize MMIO vmexits, because we have an > > * nGPA here instead of the required GPA. > > + * Skipping instruction below depends on undefined behavior: > > Intel's > > + * manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS > > + * when EPT MISCONFIG occurs and while on real hardware it was > > observed > > + * to be set, other hypervisors (namely Hyper-V) don't set it, we > > end > > + * up advancing IP with some random value. Disable fast mmio when > > + * running nested and keep it for real hardware in hope that > > + * VM_EXIT_INSTRUCTION_LEN will always be set correctly. > > If Intel manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS on EPT_MISCONFIG, > I don't think we should do this on real-hardware as-well. Neither do I, but you can see the last discussion on this topic, https://patchwork.kernel.org/patch/9903811/. In short, we've agreed to limit the hack to real hardware and wait for Intel or virtio changes. Michael and Jason, any progress on implementing a fast virtio mechanism that doesn't rely on undefined behavior? (Encode writing instruction length into last 4 bits of MMIO address, side-channel say that accesses to the MMIO area always use certain instruction length, use hypercall, ...) Thanks.