From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qt1-f201.google.com (mail-qt1-f201.google.com [209.85.160.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0D1B269E07 for ; Fri, 9 Feb 2024 15:12:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707491567; cv=none; b=Svv1v9Y7fs7HCkKf76NOwXOrIfsIT+3AHPBrkeQi36r/QAI3kCrSVXTkdINwl94KySEVNCPia+w5Czo8EMdJkCWk2Uoe14lRCBYdp6Myk6cDXtGSyOH7lz9brXSliRTC3M5Fmemm9ypprn2NYjfGO4XxaH0nHIAZYhUYKQWyyh0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707491567; c=relaxed/simple; bh=RH+NuY+D6aHD+wcrZU8Dadzd1nSEDNtwzxcILsjJCaQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=VZt5ftPLkvArQllk+yyE/q23Vs7t1r2fGDyKJc5wf/M43ySF35vtLElrGYrn3emM26A2x7++/FA90mm0Z/3Q4ke+yCLW7rSwzQ3Cz4YSJYnr0CfU1zRXXMk+lfON6gJo1SXATPn2L69Arcv8bREQBXfvq/ksjYkXuVtBWWo111o= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=O8cDoeeF; arc=none smtp.client-ip=209.85.160.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="O8cDoeeF" Received: by mail-qt1-f201.google.com with SMTP id d75a77b69052e-42c2f0266c9so39827481cf.1 for ; Fri, 09 Feb 2024 07:12:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1707491565; x=1708096365; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=be0Sgf7TyAr8aMe2h7QxmV6L47RdFNqX/VIAvjQGSD4=; b=O8cDoeeFViWjOpJrM1+IpNNI9DNHCfLMrXhm65VkusIG0Ih+Cwm2pE2+oCVClfhSdZ NHU3tFKdM4EjRar1ZaRN7im0COktjylQ+VdAtaEVYHITdt3egSbpfpf9WU1ArdvwRGnA U7jOWX60JY1sYUu4e/BUslDbJW9JVEd342zNlghquBoMsfhBPsy7UgfRU5kAf0KRB0lv 9jzDF2FLxR5v95QICiwIw/QCjN+3sRNddALeq7cq1teO6zgujO2ZJBTiyA+2MxbeH3th np49MKI/BjFAztOZizgR5YUhBmaqrtw0Ezjuq7/1qcLnHZn1iUhvOie12getFOKZkODg 9kaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707491565; x=1708096365; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=be0Sgf7TyAr8aMe2h7QxmV6L47RdFNqX/VIAvjQGSD4=; b=fEwIP6JVcKxk+8o65ft9IlhFbbwLmY58ZRHWNxHZiDq8m8BxemC4KfleTtzTwlFblw W8ZF8EHQ2orj0iSlG1x0HATsJTeUPVoLksAyEyhavWoS2njYd+eJql4EvhLiv4UQ+NYy rzAdQC1GKv/njrICAZkuHCeHsgTp26GtSbkSHqjpYrTcuL/JEIwRCsTVopEQq56Mw7Ij e4ieRYTqW3VVjTtqE8UnpUM1v4t+D9hsd2R1tN8QN8SlJRlbNvtquOp2+fpl3Bs8sysv xVfY+2dPdlsGJNp/3RgscGICi0hISwVyQOaCrL8zhqRhIbNz3NGUNg/Z56yM4/Iw70Ld v/uQ== X-Gm-Message-State: AOJu0Yx+JH/+n+g5+V5PG5EdS9Tv/ia9tJVCpNB6+3m6Eltu32mhPDVK FHdLf+UnAOSsZKzHgA4cy60SI1/9Fpb7wgMZqpw7spgmzRQX+RAaP64lnm1mXZPQ2MU2wZbobSE yYQ== X-Google-Smtp-Source: AGHT+IEp0zYmZOYMN1hp14T5hBMvMTHWqCOz/d1f2OAn/8Se/MwK1MovKkUjd+34jVqVXgYnUJUU1MpMmm0= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a0d:ea07:0:b0:5ff:6e82:ea31 with SMTP id t7-20020a0dea07000000b005ff6e82ea31mr206587ywe.3.1707491099217; Fri, 09 Feb 2024 07:04:59 -0800 (PST) Date: Fri, 9 Feb 2024 07:04:57 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240123001555.4168188-1-michal.wilczynski@intel.com> <20240125005710.GA8443@yjiang5-mobl.amr.corp.intel.com> <42d31df4-2dbf-44db-a511-a2d65324fded@intel.com> Message-ID: Subject: Re: [PATCH v2] KVM: x86: nSVM/nVMX: Fix handling triple fault on RSM instruction From: Sean Christopherson To: Paolo Bonzini Cc: Michal Wilczynski , Yunhong Jiang , mlevitsk@redhat.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, dedekind1@gmail.com, yuan.yao@intel.com, Zheyu Ma Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Thu, Feb 08, 2024, Paolo Bonzini wrote: > On Thu, Feb 8, 2024 at 2:18=E2=80=AFPM Wilczynski, Michal > wrote: > > Hi, I've tested the patch and it seems to work, both on Intel and AMD. > > There was a problem with applying this chunk though: > > > > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/= kvm-x86-ops.h > > index ac8b7614e79d..3d18fa7db353 100644 > > --- a/arch/x86/include/asm/kvm-x86-ops.h > > +++ b/arch/x86/include/asm/kvm-x86-ops.h > > @@ -119,7 +119,8 @@ KVM_X86_OP(setup_mce) > > #ifdef CONFIG_KVM_SMM > > KVM_X86_OP(smi_allowed) > > KVM_X86_OP() // <- This shouldn't be there I guess ? > > -KVM_X86_OP(leave_smm) > > +KVM_X86_OP(leave_smm_prepare) > > +KVM_X86_OP(leave_smm_commit) > > KVM_X86_OP(enable_smi_window) > > #endif > > KVM_X86_OP_OPTIONAL(dev_get_attr) > > > > Anyway I was a bit averse to this approach as I noticed in the git log > > that callbacks like e.g post_leave_smm() used to exist, but they were l= ater > > removed, so I though the maintainers don't like introducing extra > > callbacks. >=20 > If they are needed, it's fine. In my opinion a new callback is easier > to handle and understand than new state. Yeah, we ripped out post_leave_smm() because its sole usage at the time was= buggy, and having a callback without a purpose would just be dead code. > > > 2) otherwise, if the problem is that we have not gone through the > > > vmenter yet, then KVM needs to do that and _then_ inject the triple > > > fault. The fix is to merge the .triple_fault and .check_nested_events > > > callbacks, with something like the second attached patch - which > > > probably has so many problems that I haven't even tried to compile it= . > > > > Well, in this case if we know that RSM will fail it doesn't seem to me > > like it make sense to run vmenter just do kill the VM anyway, this woul= d > > be more confusing. >=20 > Note that the triple fault must not kill the VM, it's just causing a > nested vmexit from L2 to L1. KVM's algorithm to inject a > vmexit-causing event is always to first ensure that the VMCS02 (VMCB02 > for AMD) is consistent, and only then trigger the vmexit. So if patch > 2 or something like it works, that would be even better. >=20 > > I've made the fix this way based on our discussion with Sean in v1, and > > tried to mark the RSM instruction with a flag, as a one that needs > > actual HW VMenter to complete succesfully, and based on that informatio= n > > manipulate nested_run_pending. Heh, you misunderstood my suggestion. : But due to nested_run_pending being (unnecessarily) buried in vendor str= ucts, it : might actually be easier to do a cleaner fix. E.g. add yet another flag= to track : that a hardware VM-Enter needs to be completed in order to complete inst= ruction : emulation. I didn't mean add a flag to the emulator to muck with nested_run_pending, I= meant add a flag to kvm_vcpu_arch to be a superset of nested_run_pending. E.g. a= s a first step, something like the below. And then as follow up, see if it's d= oable to propagate nested_run_pending =3D> insn_emulation_needs_vmenter so that t= he nested_run_pending checks in {svm,vmx}_{interrupt,nmi,smi}_allowed() can be dropped. --- arch/x86/include/asm/kvm_host.h | 8 ++++++ arch/x86/kvm/smm.c | 10 ++++++-- arch/x86/kvm/x86.c | 44 +++++++++++++++++++++++++-------- 3 files changed, 50 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index d271ba20a0b2..bb4250551619 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -769,6 +769,14 @@ struct kvm_vcpu_arch { u64 ia32_misc_enable_msr; u64 smbase; u64 smi_count; + + /* + * Tracks if a successful VM-Enter is needed to complete emulation of + * an instruction, e.g. to ensure emulation of RSM or nested VM-Enter, + * which can directly inject events, completes before KVM attempts to + * inject new events. + */ + bool insn_emulation_needs_vmenter; bool at_instruction_boundary; bool tpr_access_reporting; bool xfd_no_write_intercept; diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c index dc3d95fdca7d..c6e597b8c794 100644 --- a/arch/x86/kvm/smm.c +++ b/arch/x86/kvm/smm.c @@ -640,8 +640,14 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt) =20 #ifdef CONFIG_X86_64 if (guest_cpuid_has(vcpu, X86_FEATURE_LM)) - return rsm_load_state_64(ctxt, &smram.smram64); + ret =3D rsm_load_state_64(ctxt, &smram.smram64); else #endif - return rsm_load_state_32(ctxt, &smram.smram32); + ret =3D rsm_load_state_32(ctxt, &smram.smram32); + + if (ret !=3D X86EMUL_CONTINUE) + return ret; + + vcpu->arch.insn_emulation_needs_vmenter =3D true; + return X86EMUL_CONTINUE; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bf10a9073a09..21a7183bbf69 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10195,6 +10195,30 @@ int kvm_check_nested_events(struct kvm_vcpu *vcpu) return kvm_x86_ops.nested_ops->check_events(vcpu); } =20 +static int kvm_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection= ) +{ + if (vcpu->arch.insn_emulation_needs_vmenter) + return -EBUSY; + + return static_call(kvm_x86_interrupt_allowed)(vcpu, for_injection); +} + +static int kvm_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection) +{ + if (vcpu->arch.insn_emulation_needs_vmenter) + return -EBUSY; + + return static_call(kvm_x86_smi_allowed)(vcpu, for_injection) +} + +static int kvm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection) +{ + if (vcpu->arch.insn_emulation_needs_vmenter) + return -EBUSY; + + return x86_nmi_static_call(kvm_x86_nmi_allowed)(vcpu, for_injection); +} + static void kvm_inject_exception(struct kvm_vcpu *vcpu) { /* @@ -10384,7 +10408,7 @@ static int kvm_check_and_inject_events(struct kvm_v= cpu *vcpu, */ #ifdef CONFIG_KVM_SMM if (vcpu->arch.smi_pending) { - r =3D can_inject ? static_call(kvm_x86_smi_allowed)(vcpu, true) : -EBUSY= ; + r =3D can_inject ? kvm_smi_allowed(vcpu, true) : -EBUSY; if (r < 0) goto out; if (r) { @@ -10398,7 +10422,7 @@ static int kvm_check_and_inject_events(struct kvm_v= cpu *vcpu, #endif =20 if (vcpu->arch.nmi_pending) { - r =3D can_inject ? static_call(kvm_x86_nmi_allowed)(vcpu, true) : -EBUSY= ; + r =3D can_inject ? kvm_nmi_allowed(vcpu, true) : -EBUSY; if (r < 0) goto out; if (r) { @@ -10406,14 +10430,14 @@ static int kvm_check_and_inject_events(struct kvm= _vcpu *vcpu, vcpu->arch.nmi_injected =3D true; static_call(kvm_x86_inject_nmi)(vcpu); can_inject =3D false; - WARN_ON(static_call(kvm_x86_nmi_allowed)(vcpu, true) < 0); + WARN_ON_ONCE(kvm_nmi_allowed() < 0); } if (vcpu->arch.nmi_pending) static_call(kvm_x86_enable_nmi_window)(vcpu); } =20 if (kvm_cpu_has_injectable_intr(vcpu)) { - r =3D can_inject ? static_call(kvm_x86_interrupt_allowed)(vcpu, true) : = -EBUSY; + r =3D can_inject ? kvm_interrupt_allowed(vcpu, true) : -EBUSY; if (r < 0) goto out; if (r) { @@ -10422,7 +10446,7 @@ static int kvm_check_and_inject_events(struct kvm_v= cpu *vcpu, if (!WARN_ON_ONCE(irq =3D=3D -1)) { kvm_queue_interrupt(vcpu, irq, false); static_call(kvm_x86_inject_irq)(vcpu, false); - WARN_ON(static_call(kvm_x86_interrupt_allowed)(vcpu, true) < 0); + WARN_ON(kvm_interrupt_allowed(vcpu, true) < 0); } } if (kvm_cpu_has_injectable_intr(vcpu)) @@ -10969,6 +10993,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) WARN_ON_ONCE((kvm_vcpu_apicv_activated(vcpu) !=3D kvm_vcpu_apicv_active(= vcpu)) && (kvm_get_apic_mode(vcpu) !=3D LAPIC_MODE_DISABLED)); =20 + vcpu->arch.insn_emulation_needs_vmenter =3D false; + exit_fastpath =3D static_call(kvm_x86_vcpu_run)(vcpu); if (likely(exit_fastpath !=3D EXIT_FASTPATH_REENTER_GUEST)) break; @@ -13051,14 +13077,12 @@ static inline bool kvm_vcpu_has_events(struct kvm= _vcpu *vcpu) return true; =20 if (kvm_test_request(KVM_REQ_NMI, vcpu) || - (vcpu->arch.nmi_pending && - static_call(kvm_x86_nmi_allowed)(vcpu, false))) + (vcpu->arch.nmi_pending && kvm_nmi_allowed(vcpu, false))) return true; =20 #ifdef CONFIG_KVM_SMM if (kvm_test_request(KVM_REQ_SMI, vcpu) || - (vcpu->arch.smi_pending && - static_call(kvm_x86_smi_allowed)(vcpu, false))) + (vcpu->arch.smi_pending && kvm_smi_allowed(vcpu, false))) return true; #endif =20 @@ -13136,7 +13160,7 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu= ) =20 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu) { - return static_call(kvm_x86_interrupt_allowed)(vcpu, false); + return kvm_interrupt_allowed(vcpu, false); } =20 unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu) base-commit: f8fe663bc413d2a14ab9a452638a99b975011a9d --=20