From: Sean Christopherson <seanjc@google.com>
To: mlevitsk@redhat.com
Cc: kvm@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>,
Borislav Petkov <bp@alien8.de>,
Paolo Bonzini <pbonzini@redhat.com>,
x86@kernel.org, Dave Hansen <dave.hansen@linux.intel.com>,
Ingo Molnar <mingo@redhat.com>,
linux-kernel@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [PATCH 3/3] x86: KVM: VMX: preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while in the guest mode
Date: Wed, 7 May 2025 16:03:34 -0700 [thread overview]
Message-ID: <aBvmxjxUrXEBa3sc@google.com> (raw)
In-Reply-To: <71af8435d2085b3f969cb3e73cff5bfacd243819.camel@redhat.com>
On Thu, May 01, 2025, mlevitsk@redhat.com wrote:
> On Thu, 2025-05-01 at 16:41 -0400, mlevitsk@redhat.com wrote:
> > On Tue, 2025-04-22 at 16:41 -0700, Sean Christopherson wrote:
> > > On Tue, Apr 15, 2025, Maxim Levitsky wrote:
> > > > Pass through the host's DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM to the guest
> > > > GUEST_IA32_DEBUGCTL without the guest seeing this value.
> > > >
> > > > Note that in the future we might allow the guest to set this bit as well,
> > > > when we implement PMU freezing on VM own, virtual SMM entry.
> > > >
> > > > Since the value of the host DEBUGCTL can in theory change between VM runs,
> > > > check if has changed, and if yes, then reload the GUEST_IA32_DEBUGCTL with
> > > > the new value of the host portion of it (currently only the
> > > > DEBUGCTLMSR_FREEZE_IN_SMM bit)
> > >
> > > No, it can't. DEBUGCTLMSR_FREEZE_IN_SMM can be toggled via IPI callback, but
> > > IRQs are disabled for the entirety of the inner run loop. And if I'm somehow
> > > wrong, this change movement absolutely belongs in a separate patch.
>
>
> Hi,
>
> You are right here - reading MSR_IA32_DEBUGCTLMSR in the inner loop is a
> performance regression.
>
> Any ideas on how to solve this then? Since currently its the common code that
> reads the current value of the MSR_IA32_DEBUGCTLMSR and it doesn't leave any
> indication about if it changed I can do either
>
> 1. store old value as well, something like 'vcpu->arch.host_debugctl_old' Ugly IMHO.
>
> 2. add DEBUG_CTL to the set of the 'dirty' registers, e.g add new bit for kvm_register_mark_dirty
> It looks a bit overkill to me
>
> 3. Add new x86 callback for something like .sync_debugctl(). I vote for this option.
>
> What do you think/prefer?
I was going to say #3 as well, but I think I have a better idea.
DR6 has a similar problem; the guest's value needs to be loaded into hardware,
but only somewhat rarely, and more importantly, never on a fastpath reentry.
Forced immediate exits also have a similar need: some control logic in common x86
needs instruct kvm_x86_ops.vcpu_run() to do something.
Unless I've misread the DEBUGCTLMSR situation, in all cases, common x86 only needs
to a single flag to tell vendor code to do something. The payload for that action
is already available.
So rather than add a bunch of kvm_x86_ops hooks that are only called immediately
before kvm_x86_ops.vcpu_run(), expand @req_immediate_exit into a bitmap of flags
to communicate what works needs to be done, without having to resort to a field
in kvm_vcpu_arch that isn't actually persistent.
The attached patches are relatively lightly tested, but the DR6 tests from the
recent bug[*] pass, so hopefully they're correct?
The downside with this approach is that it would be difficult to backport to LTS
kernels, but given how long this has been a problem, I'm not super concerned about
optimizing for backports.
If they look ok, feel free to include them in the next version. Or I can post
them separately if you want.
> > > > + __vmx_set_guest_debugctl(vcpu, vmx->msr_ia32_debugctl);
> > >
> > > I would rather have a helper that explicitly writes the VMCS field, not one that
> > > sets the guest value *and* writes the VMCS field.
> >
> > >
> > > The usage in init_vmcs() doesn't need to write vmx->msr_ia32_debugctl because the
> > > vCPU is zero allocated, and this usage doesn't change vmx->msr_ia32_debugctl.
> > > So the only path that actually needs to modify vmx->msr_ia32_debugctl is
> > > vmx_set_guest_debugctl().
> >
> > But what about nested entry? nested entry pretty much sets the MSR to a
> > value given by the guest.
> >
> > Also technically the intel_pmu_legacy_freezing_lbrs_on_pmi also changes the
> > guest value by emulating what the real hardware does.
Drat, sorry, my feedback was way too terse. What I was trying to say is that if
we cache the guest's msr_ia32_debugctl, then I would rather have this:
--
static void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu)
{
u64 val = vmx->msr_ia32_debugctl |
vcpu->arch.host_debugctl & DEBUGCTLMSR_FREEZE_IN_SMM);
vmcs_write64(GUEST_IA32_DEBUGCTL, val);
}
int vmx_set_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
{
u64 invalid = data & ~vmx_get_supported_debugctl(vcpu, host_initiated);
if (invalid & (DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR)) {
kvm_pr_unimpl_wrmsr(vcpu, MSR_IA32_DEBUGCTLMSR, data);
data &= ~(DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR);
invalid &= ~(DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR);
}
if (invalid)
return 1;
if (is_guest_mode(vcpu) && (get_vmcs12(vcpu)->vm_exit_controls &
VM_EXIT_SAVE_DEBUG_CONTROLS))
get_vmcs12(vcpu)->guest_ia32_debugctl = data;
if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)->lbr_desc.event &&
(data & DEBUGCTLMSR_LBR))
intel_pmu_create_guest_lbr_event(vcpu);
vmx->msr_ia32_debugctl = data;
vmx_guest_debugctl_write(vcpu);
return 0;
}
--
So that the path that refreshes vmcs.GUEST_IA32_DEBUGCTL on VM-Entry doesn't have
to feed in vmx->msr_ia32_debugctl, because the only value that is ever written to
hardware is vmx->msr_ia32_debugctl.
However, I'm not entirely convinced that we need to cache the guest value,
because toggling DEBUGCTLMSR_FREEZE_IN_SMM should be extremely rare. So something
like this?
--
static void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val)
{
val |= vcpu->arch.host_debugctl & DEBUGCTLMSR_FREEZE_IN_SMM);
vmcs_write64(GUEST_IA32_DEBUGCTL, val);
}
int vmx_set_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
{
u64 invalid = data & ~vmx_get_supported_debugctl(vcpu, host_initiated);
if (invalid & (DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR)) {
kvm_pr_unimpl_wrmsr(vcpu, MSR_IA32_DEBUGCTLMSR, data);
data &= ~(DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR);
invalid &= ~(DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR);
}
if (invalid)
return 1;
if (is_guest_mode(vcpu) && (get_vmcs12(vcpu)->vm_exit_controls &
VM_EXIT_SAVE_DEBUG_CONTROLS))
get_vmcs12(vcpu)->guest_ia32_debugctl = data;
if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)->lbr_desc.event &&
(data & DEBUGCTLMSR_LBR))
intel_pmu_create_guest_lbr_event(vcpu);
vmx_guest_debugctl_write(vcpu, data);
return 0;
}
--
And then when DEBUGCTLMSR_FREEZE_IN_SMM changes:
if (<is DEBUGCTLMSR_FREEZE_IN_SMM toggled>)
vmx_guest_debugctl_write(vmcs_read64(GUEST_IA32_DEBUGCTL) &
~DEBUGCTLMSR_FREEZE_IN_SMM);
And the LBR crud doesn't need to call into the "full" vmx_set_debugctl() (or we
don't even need that helper?).
Side topic, we really should be able to drop @host_initiated, because KVM's ABI
is effectively that CPUID must be set before MSRs, i.e. allowing the host to stuff
unsupported bits isn't necessary. But that's a future problem.
next prev parent reply other threads:[~2025-05-07 23:03 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-16 0:25 [PATCH 0/3] KVM: x86: allow DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM passthrough Maxim Levitsky
2025-04-16 0:25 ` [PATCH 1/3] x86: KVM: VMX: Wrap GUEST_IA32_DEBUGCTL read/write with access functions Maxim Levitsky
2025-04-22 23:33 ` Sean Christopherson
2025-05-01 20:35 ` mlevitsk
2025-05-07 17:18 ` Sean Christopherson
2025-05-13 0:34 ` mlevitsk
2025-05-14 14:28 ` Sean Christopherson
2025-04-23 9:51 ` Mi, Dapeng
2025-05-01 20:34 ` mlevitsk
2025-05-07 5:17 ` Mi, Dapeng
2025-04-16 0:25 ` [PATCH 2/3] x86: KVM: VMX: cache guest written value of MSR_IA32_DEBUGCTL Maxim Levitsky
2025-04-16 0:25 ` [PATCH 3/3] x86: KVM: VMX: preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while in the guest mode Maxim Levitsky
2025-04-22 23:41 ` Sean Christopherson
2025-05-01 20:41 ` mlevitsk
2025-05-01 20:53 ` mlevitsk
2025-05-07 5:27 ` Mi, Dapeng
2025-05-07 14:31 ` mlevitsk
2025-05-07 23:03 ` Sean Christopherson [this message]
2025-05-08 13:35 ` Sean Christopherson
2025-05-15 0:19 ` mlevitsk
2025-04-23 10:10 ` Mi, Dapeng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aBvmxjxUrXEBa3sc@google.com \
--to=seanjc@google.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=mlevitsk@redhat.com \
--cc=pbonzini@redhat.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.