From: "Mi, Dapeng" <dapeng1.mi@linux.intel.com>
To: mlevitsk@redhat.com, Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>,
Borislav Petkov <bp@alien8.de>,
Paolo Bonzini <pbonzini@redhat.com>,
x86@kernel.org, Dave Hansen <dave.hansen@linux.intel.com>,
Ingo Molnar <mingo@redhat.com>,
linux-kernel@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [PATCH 3/3] x86: KVM: VMX: preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while in the guest mode
Date: Wed, 7 May 2025 13:27:56 +0800 [thread overview]
Message-ID: <181eae79-735d-414e-9a46-caa321602204@linux.intel.com> (raw)
In-Reply-To: <71af8435d2085b3f969cb3e73cff5bfacd243819.camel@redhat.com>
On 5/2/2025 4:53 AM, mlevitsk@redhat.com wrote:
> On Thu, 2025-05-01 at 16:41 -0400, mlevitsk@redhat.com wrote:
>> On Tue, 2025-04-22 at 16:41 -0700, Sean Christopherson wrote:
>>> On Tue, Apr 15, 2025, Maxim Levitsky wrote:
>>>> Pass through the host's DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM to the guest
>>>> GUEST_IA32_DEBUGCTL without the guest seeing this value.
>>>>
>>>> Note that in the future we might allow the guest to set this bit as well,
>>>> when we implement PMU freezing on VM own, virtual SMM entry.
>>>>
>>>> Since the value of the host DEBUGCTL can in theory change between VM runs,
>>>> check if has changed, and if yes, then reload the GUEST_IA32_DEBUGCTL with
>>>> the new value of the host portion of it (currently only the
>>>> DEBUGCTLMSR_FREEZE_IN_SMM bit)
>>> No, it can't. DEBUGCTLMSR_FREEZE_IN_SMM can be toggled via IPI callback, but
>>> IRQs are disabled for the entirety of the inner run loop. And if I'm somehow
>>> wrong, this change movement absolutely belongs in a separate patch.
>
> Hi,
>
> You are right here - reading MSR_IA32_DEBUGCTLMSR in the inner loop is a performance
> regression.
>
>
> Any ideas on how to solve this then? Since currently its the common code that
> reads the current value of the MSR_IA32_DEBUGCTLMSR and it doesn't leave any indication
> about if it changed I can do either
>
> 1. store old value as well, something like 'vcpu->arch.host_debugctl_old' Ugly IMHO.
>
> 2. add DEBUG_CTL to the set of the 'dirty' registers, e.g add new bit for kvm_register_mark_dirty
> It looks a bit overkill to me
>
> 3. Add new x86 callback for something like .sync_debugctl(). I vote for this option.
>
> What do you think/prefer?
Hmm, not sure if I missed something, but why to move the reading host
debug_ctrl MSR from the original place into inner loop? The interrupt has
been disabled before reading host debug_ctrl for original code, suppose
host debug_ctrl won't changed after reading it?
>
> Best regards,
> Maxim Levitsky
>
>>>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
>>>> ---
>>>> arch/x86/kvm/svm/svm.c | 2 ++
>>>> arch/x86/kvm/vmx/vmx.c | 28 +++++++++++++++++++++++++++-
>>>> arch/x86/kvm/x86.c | 2 --
>>>> 3 files changed, 29 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>>> index cc1c721ba067..fda0660236d8 100644
>>>> --- a/arch/x86/kvm/svm/svm.c
>>>> +++ b/arch/x86/kvm/svm/svm.c
>>>> @@ -4271,6 +4271,8 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
>>>> svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
>>>> svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP];
>>>>
>>>> + vcpu->arch.host_debugctl = get_debugctlmsr();
>>>> +
>>>> /*
>>>> * Disable singlestep if we're injecting an interrupt/exception.
>>>> * We don't want our modified rflags to be pushed on the stack where
>>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>>>> index c9208a4acda4..e0bc31598d60 100644
>>>> --- a/arch/x86/kvm/vmx/vmx.c
>>>> +++ b/arch/x86/kvm/vmx/vmx.c
>>>> @@ -2194,6 +2194,17 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
>>>> return debugctl;
>>>> }
>>>>
>>>> +static u64 vmx_get_host_preserved_debugctl(struct kvm_vcpu *vcpu)
>>> No, just open code handling DEBUGCTLMSR_FREEZE_IN_SMM, or make it a #define.
>>> I'm not remotely convinced that we'll ever want to emulate DEBUGCTLMSR_FREEZE_IN_SMM,
>>> and trying to plan for that possibility and adds complexity for no immediate value.
>> Hi,
>>
>> The problem here is a bit different: we indeed are very unlikely to emulate the
>> DEBUGCTLMSR_FREEZE_IN_SMM but however, when I wrote this patch I was sure that this bit is
>> mandatory with PMU version of 2 or more, but looks like it is optional after all:
>>
>> "
>> Note that system software must check if the processor supports the IA32_DEBUGCTL.FREEZE_WHILE_SMM
>> control bit. IA32_DEBUGCTL.FREEZE_WHILE_SMM is supported if IA32_PERF_CAPABIL-
>> ITIES.FREEZE_WHILE_SMM[Bit 12] is reporting 1. See Section 20.8 for details of detecting the presence of
>> IA32_PERF_CAPABILITIES MSR."
>>
>> KVM indeed doesn't set the bit 12 of IA32_PERF_CAPABILITIES.
>>
>> However, note that the Linux kernel silently sets this bit without checking the aforementioned capability
>> bit and ends up with a #GP exception, which it silently ignores.... (I checked this with a trace...)
>>
>> This led me to believe that this bit should be unconditionally supported,
>> meaning that KVM should at least fake setting it without triggering a #GP.
>>
>> Since that is not the case, I can revert to the simpler model of exclusively using GUEST_IA32_DEBUGCTL
>> while hiding the bit from the guest, however I do vote to keep the guest/host separation.
>>
>>>> +{
>>>> + /*
>>>> + * Bits of host's DEBUGCTL that we should preserve while the guest is
>>>> + * running.
>>>> + *
>>>> + * Some of those bits might still be emulated for the guest own use.
>>>> + */
>>>> + return DEBUGCTLMSR_FREEZE_IN_SMM;
>>>>
>>>> u64 vmx_get_guest_debugctl(struct kvm_vcpu *vcpu)
>>>> {
>>>> return to_vmx(vcpu)->msr_ia32_debugctl;
>>>> @@ -2202,9 +2213,11 @@ u64 vmx_get_guest_debugctl(struct kvm_vcpu *vcpu)
>>>> static void __vmx_set_guest_debugctl(struct kvm_vcpu *vcpu, u64 data)
>>>> {
>>>> struct vcpu_vmx *vmx = to_vmx(vcpu);
>>>> + u64 host_mask = vmx_get_host_preserved_debugctl(vcpu);
>>>>
>>>> vmx->msr_ia32_debugctl = data;
>>>> - vmcs_write64(GUEST_IA32_DEBUGCTL, data);
>>>> + vmcs_write64(GUEST_IA32_DEBUGCTL,
>>>> + (vcpu->arch.host_debugctl & host_mask) | (data & ~host_mask));
>>>> }
>>>>
>>>> bool vmx_set_guest_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
>>>> @@ -2232,6 +2245,7 @@ bool vmx_set_guest_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated
>>>> return true;
>>>> }
>>>>
>>>> +
>>> Spurious newline.
>>>
>>>> /*
>>>> * Writes msr value into the appropriate "register".
>>>> * Returns 0 on success, non-0 otherwise.
>>>> @@ -7349,6 +7363,7 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
>>>> {
>>>> struct vcpu_vmx *vmx = to_vmx(vcpu);
>>>> unsigned long cr3, cr4;
>>>> + u64 old_debugctl;
>>>>
>>>> /* Record the guest's net vcpu time for enforced NMI injections. */
>>>> if (unlikely(!enable_vnmi &&
>>>> @@ -7379,6 +7394,17 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
>>>> vmcs_write32(PLE_WINDOW, vmx->ple_window);
>>>> }
>>>>
>>>> + old_debugctl = vcpu->arch.host_debugctl;
>>>> + vcpu->arch.host_debugctl = get_debugctlmsr();
>>>> +
>>>> + /*
>>>> + * In case the host DEBUGCTL had changed since the last time we
>>>> + * read it, update the guest's GUEST_IA32_DEBUGCTL with
>>>> + * the host's bits.
>>>> + */
>>>> + if (old_debugctl != vcpu->arch.host_debugctl)
>>> This can and should be optimized to only do an update if a host-preserved bit
>>> is toggled.
>> True, I will do this in the next version.
>>
>>>> + __vmx_set_guest_debugctl(vcpu, vmx->msr_ia32_debugctl);
>>> I would rather have a helper that explicitly writes the VMCS field, not one that
>>> sets the guest value *and* writes the VMCS field.
>>> The usage in init_vmcs() doesn't need to write vmx->msr_ia32_debugctl because the
>>> vCPU is zero allocated, and this usage doesn't change vmx->msr_ia32_debugctl.
>>> So the only path that actually needs to modify vmx->msr_ia32_debugctl is
>>> vmx_set_guest_debugctl().
>>
>> But what about nested entry? nested entry pretty much sets the MSR to a value given by the guest.
>>
>> Also technically the intel_pmu_legacy_freezing_lbrs_on_pmi also changes the guest value by emulating what the real hardware does.
>>
>> Best regards,
>> Maxim Levitsky
>>
>>
>>>> +
>>>> /*
>>>> * We did this in prepare_switch_to_guest, because it needs to
>>>> * be within srcu_read_lock.
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index 844e81ee1d96..05e866ed345d 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -11020,8 +11020,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>>> set_debugreg(0, 7);
>>>> }
>>>>
>>>> - vcpu->arch.host_debugctl = get_debugctlmsr();
>>>> -
>>>> guest_timing_enter_irqoff();
>>>>
>>>> for (;;) {
>>>> --
>>>> 2.26.3
>>>>
>
next prev parent reply other threads:[~2025-05-07 5:28 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-16 0:25 [PATCH 0/3] KVM: x86: allow DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM passthrough Maxim Levitsky
2025-04-16 0:25 ` [PATCH 1/3] x86: KVM: VMX: Wrap GUEST_IA32_DEBUGCTL read/write with access functions Maxim Levitsky
2025-04-22 23:33 ` Sean Christopherson
2025-05-01 20:35 ` mlevitsk
2025-05-07 17:18 ` Sean Christopherson
2025-05-13 0:34 ` mlevitsk
2025-05-14 14:28 ` Sean Christopherson
2025-04-23 9:51 ` Mi, Dapeng
2025-05-01 20:34 ` mlevitsk
2025-05-07 5:17 ` Mi, Dapeng
2025-04-16 0:25 ` [PATCH 2/3] x86: KVM: VMX: cache guest written value of MSR_IA32_DEBUGCTL Maxim Levitsky
2025-04-16 0:25 ` [PATCH 3/3] x86: KVM: VMX: preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while in the guest mode Maxim Levitsky
2025-04-22 23:41 ` Sean Christopherson
2025-05-01 20:41 ` mlevitsk
2025-05-01 20:53 ` mlevitsk
2025-05-07 5:27 ` Mi, Dapeng [this message]
2025-05-07 14:31 ` mlevitsk
2025-05-07 23:03 ` Sean Christopherson
2025-05-08 13:35 ` Sean Christopherson
2025-05-15 0:19 ` mlevitsk
2025-04-23 10:10 ` Mi, Dapeng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=181eae79-735d-414e-9a46-caa321602204@linux.intel.com \
--to=dapeng1.mi@linux.intel.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=mlevitsk@redhat.com \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.