All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Chao Gao <chao.gao@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	 Borislav Petkov <bp@alien8.de>, Xin Li <xin@zytor.com>,
	Dapeng Mi <dapeng1.mi@linux.intel.com>
Subject: Re: [PATCH 08/28] KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps
Date: Wed, 4 Jun 2025 06:56:02 -0700	[thread overview]
Message-ID: <aEBQchT0cpCKkmQ6@google.com> (raw)
In-Reply-To: <aD/c6RZvE7a1KSqk@intel.com>

On Wed, Jun 04, 2025, Chao Gao wrote:
> >diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> >index 89a77f0f1cc8..e53020939e60 100644
> >--- a/arch/x86/kvm/svm/nested.c
> >+++ b/arch/x86/kvm/svm/nested.c
> >@@ -184,6 +184,64 @@ void recalc_intercepts(struct vcpu_svm *svm)
> > 	}
> > }
> > 
> >+static int nested_svm_msrpm_merge_offsets[9] __ro_after_init;
> 
> I understand how the array size (i.e., 9) was determined :). But, adding a
> comment explaining this would be quite helpful 

Yeah, I'll write a comment explaining what all is going on.

> >+static int nested_svm_nr_msrpm_merge_offsets __ro_after_init;
> >+
> >+int __init nested_svm_init_msrpm_merge_offsets(void)
> >+{
> >+	const u32 merge_msrs[] = {
> >+		MSR_STAR,
> >+		MSR_IA32_SYSENTER_CS,
> >+		MSR_IA32_SYSENTER_EIP,
> >+		MSR_IA32_SYSENTER_ESP,
> >+	#ifdef CONFIG_X86_64
> >+		MSR_GS_BASE,
> >+		MSR_FS_BASE,
> >+		MSR_KERNEL_GS_BASE,
> >+		MSR_LSTAR,
> >+		MSR_CSTAR,
> >+		MSR_SYSCALL_MASK,
> >+	#endif
> >+		MSR_IA32_SPEC_CTRL,
> >+		MSR_IA32_PRED_CMD,
> >+		MSR_IA32_FLUSH_CMD,
> 
> MSR_IA32_DEBUGCTLMSR is missing, but it's benign since it shares the same
> offset as MSR_IA32_LAST* below.

Gah.  Once all is said and done, it's not supposed to be in this list because its
only passed through for SEV-ES guests, but my intent was to keep it in this patch,
and then removeit along with XSS, EFER, PAT, GHCB, and TSC_AUX in the next.

This made me realize that merging in chunks has a novel flaw: if there is an MSR
that KVM *doesn't* want to give L2 access to, then KVM needs to ensure its offset
isn't processed, i.e. that there isn't a "collision" with another MSR.  I don't
think it's a major concern, because the x2APIC MSRs are nicely isolated, and off
the top of my head I can't think of any MSRs that fall into that bucket.  But it's
something worth calling out in a comment, at least.

> I'm a bit concerned that we might overlook adding new MSRs to this array in the
> future, which could lead to tricky bugs. But I have no idea how to avoid this.

Me either.  One option would be to use wrapper macros for the interception helpers
to fill an array at compile time (similar to how kernel exception fixup works),
but (a) it requires modifications to the linker scripts to generate the arrays,
(b) is quite confusing/complex, and (c) it doesn't actually solve the problem,
it just inverts the problem.  Because as above, there are MSRs we *don't* want
to expose to L2, and so we'd need to add yet more code to filter those out.

And the failure mode for the inverted case would be worse, because if we missed
an MSR, KVM would incorrectly give L2 access to an MSR.  Whereas with the current
approach, a missed MSR simply means L2 gets a slower path; but functionally, it's
fine (and it has to be fine, because KVM can't force L1 to disable interception).

> Removing this array and iterating over direct_access_msrs[] directly is an
> option but it contradicts this series as one of its purposes is to remove
> direct_access_msrs[].

Using direct_access_msrs[] wouldn't solve the problem either, because nothing
ensures _that_ array is up-to-date either.

The best idea I have is to add a test that verifies the MSRs that are supposed
to be passed through actually are passed through.  It's still effectively manual
checking, but it would require us to screw up twice, i.e. forget to update both
the array and the test.  The problem is that there's no easy and foolproof way to
verify that an MSR is passed through in a selftest.

E.g. it would be possible to precisely detect L2 => L0 MSR exits via a BPF program,
but incorporating a BPF program into a KVM selftest just to detect exits isn't
something I'm keen on doing (or maintaining).

Using the "exits" stat isn't foolproof due to NMIs (IRQs can be accounted for via
"irq_exits", and to a lesser extent page faults (especially if shadow paging is
in use).

If KVM provided an "msr_exits" stats, it would be trivial to verify interception
via a selftest, but I can't quite convince myself that MSR exits are interesting
enough to warrant their own stat.

> >+		MSR_IA32_LASTBRANCHFROMIP,
> >+		MSR_IA32_LASTBRANCHTOIP,
> >+		MSR_IA32_LASTINTFROMIP,
> >+		MSR_IA32_LASTINTTOIP,
> >+
> >+		MSR_IA32_XSS,
> >+		MSR_EFER,
> >+		MSR_IA32_CR_PAT,
> >+		MSR_AMD64_SEV_ES_GHCB,
> >+		MSR_TSC_AUX,
> >+	};
> 
> 
> > 
> > 		if (kvm_vcpu_read_guest(vcpu, offset, &value, 4))
> >diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> >index 1c70293400bc..84dd1f220986 100644
> >--- a/arch/x86/kvm/svm/svm.c
> >+++ b/arch/x86/kvm/svm/svm.c
> >@@ -5689,6 +5689,10 @@ static int __init svm_init(void)
> > 	if (!kvm_is_svm_supported())
> > 		return -EOPNOTSUPP;
> > 
> >+	r = nested_svm_init_msrpm_merge_offsets();
> >+	if (r)
> >+		return r;
> >+
> 
> If the offset array is used for nested virtualization only, how about guarding
> this with nested virtualization? For example, in svm_hardware_setup():

Good idea, I'll do that in the next version.

> 	if (nested) {
> 		r = nested_svm_init_msrpm_merge_offsets();
> 		if (r)
> 			goto err;
> 
> 		pr_info("Nested Virtualization enabled\n");
> 		kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
> 	}
> 
> 
> > 	r = kvm_x86_vendor_init(&svm_init_ops);
> > 	if (r)
> > 		return r;
> >diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> >index 909b9af6b3c1..0a8041d70994 100644
> >--- a/arch/x86/kvm/svm/svm.h
> >+++ b/arch/x86/kvm/svm/svm.h
> >@@ -686,6 +686,8 @@ static inline bool nested_exit_on_nmi(struct vcpu_svm *svm)
> > 	return vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_NMI);
> > }
> > 
> >+int __init nested_svm_init_msrpm_merge_offsets(void);
> >+
> > int enter_svm_guest_mode(struct kvm_vcpu *vcpu,
> > 			 u64 vmcb_gpa, struct vmcb *vmcb12, bool from_vmrun);
> > void svm_leave_nested(struct kvm_vcpu *vcpu);
> >-- 
> >2.49.0.1204.g71687c7c1d-goog
> >

  reply	other threads:[~2025-06-04 13:56 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
2025-05-29 23:39 ` [PATCH 01/28] KVM: SVM: Don't BUG if setting up the MSR intercept bitmaps fails Sean Christopherson
2025-06-03  7:17   ` Chao Gao
2025-06-03 15:28     ` Sean Christopherson
2025-05-29 23:39 ` [PATCH 02/28] KVM: SVM: Tag MSR bitmap initialization helpers with __init Sean Christopherson
2025-06-03  7:18   ` Chao Gao
2025-05-29 23:39 ` [PATCH 03/28] KVM: SVM: Use ARRAY_SIZE() to iterate over direct_access_msrs Sean Christopherson
2025-06-03  7:57   ` Chao Gao
2025-05-29 23:39 ` [PATCH 04/28] KVM: SVM: Kill the VM instead of the host if MSR interception is buggy Sean Christopherson
2025-06-03  8:06   ` Chao Gao
2025-06-03 13:26     ` Sean Christopherson
2025-05-29 23:39 ` [PATCH 05/28] KVM: x86: Use non-atomic bit ops to manipulate "shadow" MSR intercepts Sean Christopherson
2025-06-03  2:53   ` Mi, Dapeng
2025-05-29 23:39 ` [PATCH 06/28] KVM: SVM: Massage name and param of helper that merges vmcb01 and vmcb12 MSRPMs Sean Christopherson
2025-06-03  2:32   ` Mi, Dapeng
2025-05-29 23:39 ` [PATCH 07/28] KVM: SVM: Clean up macros related to architectural MSRPM definitions Sean Christopherson
2025-05-29 23:39 ` [PATCH 08/28] KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps Sean Christopherson
2025-06-04  5:43   ` Chao Gao
2025-06-04 13:56     ` Sean Christopherson [this message]
2025-06-05  8:08       ` Chao Gao
2025-06-04 15:35   ` Paolo Bonzini
2025-06-04 15:49     ` Sean Christopherson
2025-06-04 15:35   ` Paolo Bonzini
2025-05-29 23:39 ` [PATCH 09/28] KVM: nSVM: Omit SEV-ES specific passthrough MSRs from L0+L1 bitmap merge Sean Christopherson
2025-05-29 23:39 ` [PATCH 10/28] KVM: nSVM: Don't initialize vmcb02 MSRPM with vmcb01's "always passthrough" Sean Christopherson
2025-05-29 23:39 ` [PATCH 11/28] KVM: SVM: Add helpers for accessing MSR bitmap that don't rely on offsets Sean Christopherson
2025-06-04 16:11   ` Paolo Bonzini
2025-06-04 17:35     ` Sean Christopherson
2025-06-04 19:13       ` Paolo Bonzini
2025-05-29 23:39 ` [PATCH 12/28] KVM: SVM: Implement and adopt VMX style MSR intercepts APIs Sean Christopherson
2025-06-05  8:19   ` Chao Gao
2025-05-29 23:39 ` [PATCH 13/28] KVM: SVM: Pass through GHCB MSR if and only if VM is an SEV-ES guest Sean Christopherson
2025-05-29 23:39 ` [PATCH 14/28] KVM: SVM: Drop "always" flag from list of possible passthrough MSRs Sean Christopherson
2025-05-29 23:40 ` [PATCH 15/28] KVM: x86: Move definition of X2APIC_MSR() to lapic.h Sean Christopherson
2025-05-29 23:40 ` [PATCH 16/28] KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter change Sean Christopherson
2025-05-30 23:38   ` Xin Li
2025-06-02 17:10     ` Sean Christopherson
2025-06-03  3:52   ` Mi, Dapeng
2025-06-05  6:59   ` Chao Gao
2025-05-29 23:40 ` [PATCH 17/28] KVM: SVM: " Sean Christopherson
2025-05-31 18:01   ` Francesco Lavra
2025-06-02 17:08     ` Sean Christopherson
2025-06-05  6:24   ` Chao Gao
2025-06-05 16:39     ` Sean Christopherson
2025-05-29 23:40 ` [PATCH 18/28] KVM: x86: Rename msr_filter_changed() => recalc_msr_intercepts() Sean Christopherson
2025-05-30 23:47   ` Xin Li
2025-05-29 23:40 ` [PATCH 19/28] KVM: SVM: Rename init_vmcb_after_set_cpuid() to make it intercepts specific Sean Christopherson
2025-05-29 23:40 ` [PATCH 20/28] KVM: SVM: Fold svm_vcpu_init_msrpm() into its sole caller Sean Christopherson
2025-05-29 23:40 ` [PATCH 21/28] KVM: SVM: Merge "after set CPUID" intercept recalc helpers Sean Christopherson
2025-05-29 23:40 ` [PATCH 22/28] KVM: SVM: Drop explicit check on MSRPM offset when emulating SEV-ES accesses Sean Christopherson
2025-05-29 23:40 ` [PATCH 23/28] KVM: SVM: Move svm_msrpm_offset() to nested.c Sean Christopherson
2025-05-29 23:40 ` [PATCH 24/28] KVM: SVM: Store MSRPM pointer as "void *" instead of "u32 *" Sean Christopherson
2025-05-29 23:40 ` [PATCH 25/28] KVM: nSVM: Access MSRPM in 4-byte chunks only for merging L0 and L1 bitmaps Sean Christopherson
2025-06-04 16:19   ` Paolo Bonzini
2025-06-04 16:28     ` Sean Christopherson
2025-05-29 23:40 ` [PATCH 26/28] KVM: SVM: Return -EINVAL instead of MSR_INVALID to signal out-of-range MSR Sean Christopherson
2025-05-29 23:40 ` [PATCH 27/28] KVM: nSVM: Merge MSRPM in 64-bit chunks on 64-bit kernels Sean Christopherson
2025-05-29 23:40 ` [PATCH 28/28] KVM: selftests: Verify KVM disable interception (for userspace) on filter change Sean Christopherson
2025-06-03  5:47   ` Mi, Dapeng
2025-06-04  4:43     ` Manali Shukla

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aEBQchT0cpCKkmQ6@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=chao.gao@intel.com \
    --cc=dapeng1.mi@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=xin@zytor.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.