From: Sean Christopherson <seanjc@google.com>
To: Kevin Cheng <chengkev@google.com>
Cc: pbonzini@redhat.com, jmattson@google.com, yosry.ahmed@linux.dev,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] KVM: SVM: Don't allow L1 intercepts for instructions not advertised
Date: Tue, 9 Dec 2025 08:52:53 -0800 [thread overview]
Message-ID: <aThT5d5WdMSszN9b@google.com> (raw)
In-Reply-To: <20251205070630.4013452-1-chengkev@google.com>
On Fri, Dec 05, 2025, Kevin Cheng wrote:
> If a feature is not advertised in the guest's CPUID, prevent L1 from
> intercepting the unsupported instructions by clearing the corresponding
> intercept in KVM's cached vmcb12.
>
> When an L2 guest executes an instruction that is not advertised to L1,
> we expect a #UD exception to be injected by L0. However, the nested svm
> exit handler first checks if the instruction intercept is set in vmcb12,
> and if so, synthesizes an exit from L2 to L1 instead of a #UD exception.
> If a feature is not advertised, the L1 intercept should be ignored.
>
> Calculate the nested intercept mask by checking all instructions that
> can be intercepted and are controlled by a CPUID bit. Use this mask when
> copying from the vmcb12 to KVM's cached vmcb12 to effectively ignore the
> intercept on nested vm exit handling.
>
> Another option is to handle ignoring the L1 intercepts in the nested vm
> exit code path, but I've gone with modifying the cached vmcb12 to keep
> it simpler.
>
> Signed-off-by: Kevin Cheng <chengkev@google.com>
> ---
> arch/x86/kvm/svm/nested.c | 30 +++++++++++++++++++++++++++++-
> arch/x86/kvm/svm/svm.c | 2 ++
> arch/x86/kvm/svm/svm.h | 14 ++++++++++++++
> 3 files changed, 45 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index c81005b245222..f2ade24908b39 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -184,6 +184,33 @@ void recalc_intercepts(struct vcpu_svm *svm)
> }
> }
>
> +/*
> + * If a feature is not advertised to L1, set the mask bit for the corresponding
> + * vmcb12 intercept.
> + */
> +void svm_recalc_nested_intercepts_mask(struct kvm_vcpu *vcpu)
> +{
> + struct vcpu_svm *svm = to_svm(vcpu);
> +
> + memset(svm->nested.nested_intercept_mask, 0,
> + sizeof(svm->nested.nested_intercept_mask));
> +
> + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP))
> + set_nested_intercept_mask(&svm->nested, INTERCEPT_RDTSCP);
> +
> + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SKINIT))
> + set_nested_intercept_mask(&svm->nested, INTERCEPT_SKINIT);
> +
> + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVE))
> + set_nested_intercept_mask(&svm->nested, INTERCEPT_XSETBV);
> +
> + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_RDPRU))
> + set_nested_intercept_mask(&svm->nested, INTERCEPT_RDPRU);
> +
> + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_INVPCID))
> + set_nested_intercept_mask(&svm->nested, INTERCEPT_INVPCID);
Ugh. I don't see any reason for svm->nested.nested_intercept_mask to exist.
guest_cpu_cap_has() is cheap (which is largely why it even exists), just sanitize
the vmcb02 intercepts on-demand. The name is also wonky: it "sets" bits only to
effect a "clear" of those bits.
Gah, and the helpers to access/mutate intercepts can be cleaned up. E.g. if we
do something like this:
static inline void __vmcb_set_intercept(unsigned long *intercepts, u32 bit)
{
WARN_ON_ONCE(bit >= 32 * MAX_INTERCEPT);
__set_bit(bit, intercepts);
}
static inline void __vmcb_clr_intercept(unsigned long *intercepts, u32 bit)
{
WARN_ON_ONCE(bit >= 32 * MAX_INTERCEPT);
__clear_bit(bit, intercepts);
}
static inline bool __vmcb_is_intercept(unsigned long *intercepts, u32 bit)
{
WARN_ON_ONCE(bit >= 32 * MAX_INTERCEPT);
return test_bit(bit, intercepts);
}
static inline void vmcb_set_intercept(struct vmcb_control_area *control, u32 bit)
{
__vmcb_set_intercept((unsigned long *)&control->intercepts, bit);
}
static inline void vmcb_clr_intercept(struct vmcb_control_area *control, u32 bit)
{
__vmcb_clr_intercept((unsigned long *)&control->intercepts, bit);
}
static inline bool vmcb_is_intercept(struct vmcb_control_area *control, u32 bit)
{
return __vmcb_is_intercept((unsigned long *)&control->intercepts, bit);
}
static inline void vmcb12_clr_intercept(struct vmcb_ctrl_area_cached *control, u32 bit)
{
__vmcb_clr_intercept((unsigned long *)&control->intercepts, bit);
}
static inline bool vmcb12_is_intercept(struct vmcb_ctrl_area_cached *control, u32 bit)
{
return __vmcb_is_intercept((unsigned long *)&control->intercepts, bit);
}
> +}
> +
> /*
> * This array (and its actual size) holds the set of offsets (indexing by chunk
> * size) to process when merging vmcb12's MSRPM with vmcb01's MSRPM. Note, the
> @@ -408,10 +435,11 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu,
> struct vmcb_ctrl_area_cached *to,
> struct vmcb_control_area *from)
> {
> + struct vcpu_svm *svm = to_svm(vcpu);
> unsigned int i;
>
> for (i = 0; i < MAX_INTERCEPT; i++)
> - to->intercepts[i] = from->intercepts[i];
> + to->intercepts[i] = from->intercepts[i] & ~(svm->nested.nested_intercept_mask[i]);
Then here we can use vmcb_clr_intercept(). And if with macro shenanigans, we
can cut down on the boilerplate like so:
#define __nested_svm_sanitize_intercept(__vcpu, __control, fname, iname) \
do { \
if (!guest_cpu_cap_has(__vcpu, X86_FEATURE_##fname)) \
vmcb12_clr_intercept(__control, INTERCEPT_##iname); \
} while (0)
#define nested_svm_sanitize_intercept(__vcpu, __control, name) \
__nested_svm_sanitize_intercept(__vcpu, __control, name, name)
static
void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu,
struct vmcb_ctrl_area_cached *to,
struct vmcb_control_area *from)
{
unsigned int i;
for (i = 0; i < MAX_INTERCEPT; i++)
to->intercepts[i] = from->intercepts[i];
nested_svm_sanitize_intercept(vcpu, to, RDTSCP);
nested_svm_sanitize_intercept(vcpu, to, SKINIT);
__nested_svm_sanitize_intercept(vcpu, to, XSAVE, XSETBV);
nested_svm_sanitize_intercept(vcpu, to, RDPRU);
nested_svm_sanitize_intercept(vcpu, to, INVPCID);
Side topic, do we care about handling the case where userspace sets CPUID after
stuffing guest state? I'm very tempted to send a patch disallowing KVM_SET_CPUID
if is_guest_mode() is true, and hoping no one cares.
>
> to->iopm_base_pa = from->iopm_base_pa;
> to->msrpm_base_pa = from->msrpm_base_pa;
next prev parent reply other threads:[~2025-12-09 16:52 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-05 7:06 [PATCH] KVM: SVM: Don't allow L1 intercepts for instructions not advertised Kevin Cheng
2025-12-09 1:12 ` Yosry Ahmed
2025-12-09 16:52 ` Sean Christopherson [this message]
2025-12-13 16:11 ` Kevin Cheng
2025-12-16 16:01 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aThT5d5WdMSszN9b@google.com \
--to=seanjc@google.com \
--cc=chengkev@google.com \
--cc=jmattson@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=yosry.ahmed@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.