Re: [PATCH] KVM: VMX: Add quirk to allow L1 to set FREEZE_IN_SMM in vmcs12

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Jim Mattson <jmattson@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	 Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org,  "H. Peter Anvin" <hpa@zytor.com>,
	Maxim Levitsky <mlevitsk@redhat.com>,
	kvm@vger.kernel.org,  linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] KVM: VMX: Add quirk to allow L1 to set FREEZE_IN_SMM in vmcs12
Date: Tue, 3 Feb 2026 18:00:07 -0800	[thread overview]
Message-ID: <aYKoJ74MWboBuE_M@google.com> (raw)
In-Reply-To: <CALMp9eQx7EVim4iYGbAhoHrei2YmTra6oxtdmKaY7bw-M0PHbw@mail.gmail.com>

On Thu, Jan 22, 2026, Jim Mattson wrote:
> On Tue, Jan 13, 2026 at 7:47 PM Jim Mattson <jmattson@google.com> wrote:
> > On Tue, Jan 13, 2026 at 4:42 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Tue, Jan 13, 2026, Jim Mattson wrote:
> > > > Add KVM_X86_QUIRK_VMCS12_FREEZE_IN_SMM to allow L1 to set
> > > > IA32_DEBUGCTL.FREEZE_IN_SMM in vmcs12 when using nested VMX.  Prior to
> > > > commit 6b1dd26544d0 ("KVM: VMX: Preserve host's
> > > > DEBUGCTLMSR_FREEZE_IN_SMM while running the guest"), L1 could set
> > > > FREEZE_IN_SMM in vmcs12 to freeze PMCs during physical SMM coincident
> > > > with L2's execution.  The quirk is enabled by default for backwards
> > > > compatibility; userspace can disable it via KVM_CAP_DISABLE_QUIRKS2 if
> > > > consistency with WRMSR(IA32_DEBUGCTL) is desired.
> > >
> > > It's probably worth calling out that KVM will still drop FREEZE_IN_SMM in vmcs02
> > >
> > >         if (vmx->nested.nested_run_pending &&
> > >             (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) {
> > >                 kvm_set_dr(vcpu, 7, vmcs12->guest_dr7);
> > >                 vmx_guest_debugctl_write(vcpu, vmcs12->guest_ia32_debugctl &
> > >                                                vmx_get_supported_debugctl(vcpu, false)); <====
> > >         } else {
> > >                 kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
> > >                 vmx_guest_debugctl_write(vcpu, vmx->nested.pre_vmenter_debugctl);
> > >         }
> > >
> > > both from a correctness standpoint and so that users aren't mislead into thinking
> > > the quirk lets L1 control of FREEZE_IN_SMM while running L2.
> >
> > Yes, it's probably worth pointing out that the VM is now subject to
> > the whims of the L0 administrators.
> >
> > While that makes some sense for the legacy vPMU, where KVM is just
> > another client of host perf, perhaps the decision should be revisited
> > in the case of the MPT vPMU, where KVM owns the PMU while the vCPU is
> > in VMX non-root operation.

Eh, running guests with FREEZE_IN_SMM=0 seems absolutely crazy from a security
perspective.  If an admin wants to disable FREEZE_IN_SMM, they get to keep the
pieces.  And KVM definitely isn't going to override the admin, e.g. to allow the
guest to profile host SMM.

> > > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > > index 0521b55d47a5..bc8f0b3aa70b 100644
> > > > --- a/arch/x86/kvm/vmx/nested.c
> > > > +++ b/arch/x86/kvm/vmx/nested.c
> > > > @@ -3298,10 +3298,24 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
> > > >       if (CC(vmcs12->guest_cr4 & X86_CR4_CET && !(vmcs12->guest_cr0 & X86_CR0_WP)))
> > > >               return -EINVAL;
> > > >
> > > > -     if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
> > > > -         (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
> > > > -          CC(!vmx_is_valid_debugctl(vcpu, vmcs12->guest_ia32_debugctl, false))))
> > > > -             return -EINVAL;
> > > > +     if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) {
> > > > +             u64 debugctl = vmcs12->guest_ia32_debugctl;
> > > > +
> > > > +             /*
> > > > +              * FREEZE_IN_SMM is not virtualized, but allow L1 to set it in
> > > > +              * L2's DEBUGCTL under a quirk for backwards compatibility.
> > > > +              * Prior to KVM taking ownership of the bit to ensure PMCs are
> > > > +              * frozen during physical SMM, L1 could set FREEZE_IN_SMM in
> > > > +              * vmcs12 to freeze PMCs during physical SMM coincident with
> > > > +              * L2's execution.
> > > > +              */
> > > > +             if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_VMCS12_FREEZE_IN_SMM))
> > > > +                     debugctl &= ~DEBUGCTLMSR_FREEZE_IN_SMM;
> > > > +
> > > > +             if (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
> > > > +                 CC(!vmx_is_valid_debugctl(vcpu, debugctl, false)))
> > >
> > > I'm mildly tempted to say we should quirk the entire consistency check instead of
> > > limiting it to FREEZE_IN_SMM, purely so that we don't have to add yet another quirk
> > > if a different setup breaks on a different bit.  I suppose we could limit the quirk
> > > to bits that could have been plausibly set in hardware, because otherwise VM-Entry
> > > using L2 would VM-Fail, but that's still quite a few bits.
> > >
> > > I'm definitely not opposed to a targeted quirk though.
> >
> > I have no preference.

After mulling over the options from time to time, I think our best be is to quirk
only FREEZE_IN_SMM, but very explicity scope the quirk to just the consistency
check.  E.g. maybe KVM_X86_QUIRK_VMCS12_FREEZE_IN_SMM_CC?  That should help alert
readers to the fact that the quirk bypasses the check, but L2 will still see
FREEZE_IN_SMM=0 (e.g. in the unlikely scenario L1 disables interception of
DEBUGCTL).

As for why just FREEZE_IN_SMM, in addition to the fact that FREEZE_IN_SMM is the
only bit that broke anyone (as far as we know, /knock wood), it's also the only
bit that is host-owned.  I.e. unless the host admin likes SMM mucking with things,
skipping the consistency check isn't terrible from a functionality perspective
(KVM doesn't honor the bit for emulated SMM, but that's QEMU's problem :-D).

> Would you like me to post a v2?

Yes please.

next prev parent reply	other threads:[~2026-02-04  2:00 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-13 22:53 [PATCH] KVM: VMX: Add quirk to allow L1 to set FREEZE_IN_SMM in vmcs12 Jim Mattson
2026-01-14  0:42 ` Sean Christopherson
2026-01-14  3:47   ` Jim Mattson
2026-01-22 21:26     ` Jim Mattson
2026-02-04  2:00       ` Sean Christopherson [this message]
2026-02-05  0:42         ` Jim Mattson
2026-02-05  1:18           ` Sean Christopherson
2026-02-05  4:11             ` Jim Mattson
2026-02-05 14:47               ` Sean Christopherson
2026-02-05 17:43                 ` Jim Mattson
2026-02-05 18:16                   ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aYKoJ74MWboBuE_M@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.