public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: mlevitsk@redhat.com
Cc: kvm@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	Borislav Petkov <bp@alien8.de>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	x86@kernel.org,  Dave Hansen <dave.hansen@linux.intel.com>,
	Ingo Molnar <mingo@redhat.com>,
	 linux-kernel@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [PATCH 3/3] x86: KVM: VMX: preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while in the guest mode
Date: Wed, 7 May 2025 16:03:34 -0700	[thread overview]
Message-ID: <aBvmxjxUrXEBa3sc@google.com> (raw)
In-Reply-To: <71af8435d2085b3f969cb3e73cff5bfacd243819.camel@redhat.com>

On Thu, May 01, 2025, mlevitsk@redhat.com wrote:
> On Thu, 2025-05-01 at 16:41 -0400, mlevitsk@redhat.com wrote:
> > On Tue, 2025-04-22 at 16:41 -0700, Sean Christopherson wrote:
> > > On Tue, Apr 15, 2025, Maxim Levitsky wrote:
> > > > Pass through the host's DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM to the guest
> > > > GUEST_IA32_DEBUGCTL without the guest seeing this value.
> > > > 
> > > > Note that in the future we might allow the guest to set this bit as well,
> > > > when we implement PMU freezing on VM own, virtual SMM entry.
> > > > 
> > > > Since the value of the host DEBUGCTL can in theory change between VM runs,
> > > > check if has changed, and if yes, then reload the GUEST_IA32_DEBUGCTL with
> > > > the new value of the host portion of it (currently only the
> > > > DEBUGCTLMSR_FREEZE_IN_SMM bit)
> > > 
> > > No, it can't.  DEBUGCTLMSR_FREEZE_IN_SMM can be toggled via IPI callback, but
> > > IRQs are disabled for the entirety of the inner run loop.  And if I'm somehow
> > > wrong, this change movement absolutely belongs in a separate patch.
> 
> 
> Hi,
> 
> You are right here - reading MSR_IA32_DEBUGCTLMSR in the inner loop is a
> performance regression.
> 
> Any ideas on how to solve this then? Since currently its the common code that
> reads the current value of the MSR_IA32_DEBUGCTLMSR and it doesn't leave any
> indication about if it changed I can do either
> 
> 1. store old value as well, something like 'vcpu->arch.host_debugctl_old' Ugly IMHO.
> 
> 2. add DEBUG_CTL to the set of the 'dirty' registers, e.g add new bit for kvm_register_mark_dirty
> It looks a bit overkill to me
> 
> 3. Add new x86 callback for something like .sync_debugctl(). I vote for this option.
> 
> What do you think/prefer?

I was going to say #3 as well, but I think I have a better idea.

DR6 has a similar problem; the guest's value needs to be loaded into hardware,
but only somewhat rarely, and more importantly, never on a fastpath reentry.

Forced immediate exits also have a similar need: some control logic in common x86
needs instruct kvm_x86_ops.vcpu_run() to do something.

Unless I've misread the DEBUGCTLMSR situation, in all cases, common x86 only needs
to a single flag to tell vendor code to do something.  The payload for that action
is already available.

So rather than add a bunch of kvm_x86_ops hooks that are only called immediately
before kvm_x86_ops.vcpu_run(), expand @req_immediate_exit into a bitmap of flags
to communicate what works needs to be done, without having to resort to a field
in kvm_vcpu_arch that isn't actually persistent.

The attached patches are relatively lightly tested, but the DR6 tests from the
recent bug[*] pass, so hopefully they're correct?

The downside with this approach is that it would be difficult to backport to LTS
kernels, but given how long this has been a problem, I'm not super concerned about
optimizing for backports.

If they look ok, feel free to include them in the next version.  Or I can post
them separately if you want.

> > > > +		__vmx_set_guest_debugctl(vcpu, vmx->msr_ia32_debugctl);
> > > 
> > > I would rather have a helper that explicitly writes the VMCS field, not one that
> > > sets the guest value *and* writes the VMCS field.
> > 
> > > 
> > > The usage in init_vmcs() doesn't need to write vmx->msr_ia32_debugctl because the
> > > vCPU is zero allocated, and this usage doesn't change vmx->msr_ia32_debugctl.
> > > So the only path that actually needs to modify vmx->msr_ia32_debugctl is
> > > vmx_set_guest_debugctl().
> > 
> > But what about nested entry? nested entry pretty much sets the MSR to a
> > value given by the guest.
> > 
> > Also technically the intel_pmu_legacy_freezing_lbrs_on_pmi also changes the
> > guest value by emulating what the real hardware does.

Drat, sorry, my feedback was way too terse.  What I was trying to say is that if
we cache the guest's msr_ia32_debugctl, then I would rather have this:

--
static void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu)
{
	u64 val = vmx->msr_ia32_debugctl |
		  vcpu->arch.host_debugctl & DEBUGCTLMSR_FREEZE_IN_SMM);

	vmcs_write64(GUEST_IA32_DEBUGCTL, val);
}

int vmx_set_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
{
	u64 invalid = data & ~vmx_get_supported_debugctl(vcpu, host_initiated);

	if (invalid & (DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR)) {
		kvm_pr_unimpl_wrmsr(vcpu, MSR_IA32_DEBUGCTLMSR, data);
		data &= ~(DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR);
		invalid &= ~(DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR);
	}

	if (invalid)
		return 1;

	if (is_guest_mode(vcpu) && (get_vmcs12(vcpu)->vm_exit_controls &
					VM_EXIT_SAVE_DEBUG_CONTROLS))
		get_vmcs12(vcpu)->guest_ia32_debugctl = data;

	if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)->lbr_desc.event &&
	    (data & DEBUGCTLMSR_LBR))
		intel_pmu_create_guest_lbr_event(vcpu);

	vmx->msr_ia32_debugctl = data;
	vmx_guest_debugctl_write(vcpu);
	return 0;
}
--

So that the path that refreshes vmcs.GUEST_IA32_DEBUGCTL on VM-Entry doesn't have
to feed in vmx->msr_ia32_debugctl, because the only value that is ever written to
hardware is vmx->msr_ia32_debugctl.

However, I'm not entirely convinced that we need to cache the guest value,
because toggling DEBUGCTLMSR_FREEZE_IN_SMM should be extremely rare.  So something
like this?

--
static void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val)
{
	val |= vcpu->arch.host_debugctl & DEBUGCTLMSR_FREEZE_IN_SMM);

	vmcs_write64(GUEST_IA32_DEBUGCTL, val);
}

int vmx_set_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
{
	u64 invalid = data & ~vmx_get_supported_debugctl(vcpu, host_initiated);

	if (invalid & (DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR)) {
		kvm_pr_unimpl_wrmsr(vcpu, MSR_IA32_DEBUGCTLMSR, data);
		data &= ~(DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR);
		invalid &= ~(DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR);
	}

	if (invalid)
		return 1;

	if (is_guest_mode(vcpu) && (get_vmcs12(vcpu)->vm_exit_controls &
					VM_EXIT_SAVE_DEBUG_CONTROLS))
		get_vmcs12(vcpu)->guest_ia32_debugctl = data;

	if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)->lbr_desc.event &&
	    (data & DEBUGCTLMSR_LBR))
		intel_pmu_create_guest_lbr_event(vcpu);

	vmx_guest_debugctl_write(vcpu, data);
	return 0;
}
--

And then when DEBUGCTLMSR_FREEZE_IN_SMM changes:

	if (<is DEBUGCTLMSR_FREEZE_IN_SMM toggled>)
		vmx_guest_debugctl_write(vmcs_read64(GUEST_IA32_DEBUGCTL) &
					 ~DEBUGCTLMSR_FREEZE_IN_SMM);

And the LBR crud doesn't need to call into the "full" vmx_set_debugctl() (or we
don't even need that helper?).

Side topic, we really should be able to drop @host_initiated, because KVM's ABI
is effectively that CPUID must be set before MSRs, i.e. allowing the host to stuff
unsupported bits isn't necessary.  But that's a future problem.

  parent reply	other threads:[~2025-05-07 23:03 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-16  0:25 [PATCH 0/3] KVM: x86: allow DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM passthrough Maxim Levitsky
2025-04-16  0:25 ` [PATCH 1/3] x86: KVM: VMX: Wrap GUEST_IA32_DEBUGCTL read/write with access functions Maxim Levitsky
2025-04-22 23:33   ` Sean Christopherson
2025-05-01 20:35     ` mlevitsk
2025-05-07 17:18       ` Sean Christopherson
2025-05-13  0:34         ` mlevitsk
2025-05-14 14:28           ` Sean Christopherson
2025-04-23  9:51   ` Mi, Dapeng
2025-05-01 20:34     ` mlevitsk
2025-05-07  5:17       ` Mi, Dapeng
2025-04-16  0:25 ` [PATCH 2/3] x86: KVM: VMX: cache guest written value of MSR_IA32_DEBUGCTL Maxim Levitsky
2025-04-16  0:25 ` [PATCH 3/3] x86: KVM: VMX: preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while in the guest mode Maxim Levitsky
2025-04-22 23:41   ` Sean Christopherson
2025-05-01 20:41     ` mlevitsk
2025-05-01 20:53       ` mlevitsk
2025-05-07  5:27         ` Mi, Dapeng
2025-05-07 14:31           ` mlevitsk
2025-05-07 23:03         ` Sean Christopherson [this message]
2025-05-08 13:35           ` Sean Christopherson
2025-05-15  0:19             ` mlevitsk
2025-04-23 10:10   ` Mi, Dapeng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBvmxjxUrXEBa3sc@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox