Re: [PATCH v5 4/5] KVM: SVM: Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM CPUs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Manali Shukla <manali.shukla@amd.com>,
	kvm@vger.kernel.org,  linux-kselftest@vger.kernel.org,
	pbonzini@redhat.com, nikunj@amd.com,  bp@alien8.de
Subject: Re: [PATCH v5 4/5] KVM: SVM: Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM CPUs
Date: Tue, 3 Feb 2026 06:58:08 -0800	[thread overview]
Message-ID: <aYINAAGPto_TqLt-@google.com> (raw)
In-Reply-To: <peroaux2ghnb2ypg2ebzflb3xvg3bzpaircqht3vdgy7tkrwn4@pfpkfhasn44i>

On Mon, Feb 02, 2026, Yosry Ahmed wrote:
> On Fri, May 02, 2025 at 05:03:45AM +0000, Manali Shukla wrote:
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index 834b67672d50..5369d9517fbb 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -678,6 +678,33 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
> >  	vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa;
> >  	vmcb02->control.msrpm_base_pa = vmcb01->control.msrpm_base_pa;
> >  
> > +	/*
> > +	 * Stash vmcb02's counter if the guest hasn't moved past the guilty
> > +	 * instrution; otherwise, reset the counter to '0'.
> > +	 *
> > +	 * In order to detect if L2 has made forward progress or not, track the
> > +	 * RIP at which a bus lock has occurred on a per-vmcb12 basis.  If RIP
> > +	 * is changed, guest has clearly made forward progress, bus_lock_counter
> > +	 * still remained '1', so reset bus_lock_counter to '0'. Eg. In the
> > +	 * scenario, where a buslock happened in L1 before VMRUN, the bus lock
> > +	 * firmly happened on an instruction in the past. Even if vmcb01's
> > +	 * counter is still '1', (because the guilty instruction got patched),
> > +	 * the vCPU has clearly made forward progress and so KVM should reset
> > +	 * vmcb02's counter to '0'.
> > +	 *
> > +	 * If the RIP hasn't changed, stash the bus lock counter at nested VMRUN
> > +	 * to prevent the same guilty instruction from triggering a VM-Exit. Eg.
> > +	 * if userspace rate-limits the vCPU, then it's entirely possible that
> > +	 * L1's tick interrupt is pending by the time userspace re-runs the
> > +	 * vCPU.  If KVM unconditionally clears the counter on VMRUN, then when
> > +	 * L1 re-enters L2, the same instruction will trigger a VM-Exit and the
> > +	 * entire cycle start over.
> > +	 */
> > +	if (vmcb02->save.rip && (svm->nested.ctl.bus_lock_rip == vmcb02->save.rip))
> 
> Is vmcb02->save.rip the right thing to use here?

Objection, leading the witness your honor.

> IIUC, we want to find out if L2 made forward progress since the last bus
> lock #VMEXIT from L2 to L0

More or less.
  
>, at which point we record bus_lock_rip.
> However, on nested VMRUN, vmcb02->save.rip is only populated from the
> vmcb12 in nested_vmcb02_prepare_save(), which doesn't happen until after
> nested_vmcb02_prepare_control(). So waht we're comparing against here is
> L2's RIP last time KVM ran it.

Hrm, that definitely seems like what past me intended[*], as this quite clearly
suggests my intent was to check the RIP coming from vmcb12:

 : Again, it's imperfect, e.g. if L1 happens to run a different vCPU at the same
 : RIP, then KVM will allow a bus lock for the wrong vCPU.  But the odds of that
 : happening are absurdly low unless L1 is deliberately doing weird things, and so
 : I don't think

I have this vague feeling that I deliberately didn't check @vmcb12_rip, but the
more I look at this, the more I'm convinced I simply didn't notice @vmcb12_rip
when typing up the suggestion.

[*] https://lore.kernel.org/all/Zw6rJ3y_F-10xBcH@google.com

> It's even worse in the KVM_SET_NESTED_STATE path, because
> vmcb02->save.rip will be unintialized (probably zero).

FWIW, this one is arguably ideal, in the sense that KVM should probably assume
the worst if userspace is loading guest state.

> Probably you want to use vmcb12_rip here, as this is the RIP that L1
> wants to run L2 with, and what will end up in vmcb02->save.rip.
>
> HOWEVER, that is also broken in the KVM_SET_NESTED_STATE path. In that
> path we pass in the uninitialized vmcb02->save.rip as vmcb12_rip anyway.


> Fixing this is non-trivial because KVM_SET_REGS could be called before
> or after KVM_SET_NESTED_STATE, so the RIP may not be available at all at
> that point.

Eh, this code is completely meaningless on KVM_SET_NESTED_STATE.  There is no
"right" answer, because KVM has no clue what constitutes forward progress.

Luckily, no answer is completely "wrong" either, because the #VMEXIT is transparent
to L1 and L2.  E.g. it's like saying that sending an IRQ to CPU running a vCPU is
"wrong"; it's never truly "wrong", just suboptimal (sometimes *very* suboptimal).

> We probably want to set a flag in svm_set_nested_state() that the RIP
> needs fixing up, and the perhaps in svm_vcpu_pre_run() fix up the
> control fields in vmcb02 that depend on it: namely the bus_lock_counter,
> next_rip, and soft_int_* fields.

Nah, not unless there's a meaningful, negative impact on a real world setup.
And as above, in practice the only impact is effectively a performance blip due
to generating an unwanted userspace exit.  If userspace is stuffing nested state
so often that that's actually a performance problem, then userspace can set regs
before nested state.

There are a pile of edge cases and "what ifs" around this, none of which have a
perfect answer since KVM doesn't know userspace's intent.  And so I want to go
with a solution that is as simple as possible without risking putting L2 into an
infinite loop.

> It gets worse because I think next_rip is also not always properly
> saved/restored because we do not sync it from vmcb02 to cache in
> nested_sync_control_from_vmcb02() -- but that one is not relevant to
> bus_lock_counter AFAICT.

Correct.

All in all, I think just this?

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index de90b104a0dd..482092b2051c 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -809,7 +809,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
         * L1 re-enters L2, the same instruction will trigger a VM-Exit and the
         * entire cycle start over.
         */
-       if (vmcb02->save.rip && (svm->nested.ctl.bus_lock_rip == vmcb02->save.rip))
+       if (vmcb12_rip && (svm->nested.ctl.bus_lock_rip == vmcb12_rip))
                vmcb02->control.bus_lock_counter = 1;
        else
                vmcb02->control.bus_lock_counter = 0;

next prev parent reply	other threads:[~2026-02-03 14:58 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-02  5:03 [PATCH v5 0/5] Add support for the Bus Lock Threshold Manali Shukla
2025-05-02  5:03 ` [PATCH v5 1/5] KVM: x86: Make kvm_pio_request.linear_rip a common field for user exits Manali Shukla
2025-05-02  5:03 ` [PATCH v5 2/5] x86/cpufeatures: Add CPUID feature bit for the Bus Lock Threshold Manali Shukla
2025-05-12 10:46   ` Borislav Petkov
2025-05-02  5:03 ` [PATCH v5 3/5] KVM: SVM: Enable Bus lock threshold exit Manali Shukla
2025-05-02  5:03 ` [PATCH v5 4/5] KVM: SVM: Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM CPUs Manali Shukla
2025-05-17 16:46   ` ALOK TIWARI
2026-02-02 20:44   ` Yosry Ahmed
2026-02-03 14:58     ` Sean Christopherson [this message]
2026-02-03 15:23       ` Yosry Ahmed
2025-05-02  5:03 ` [PATCH v5 5/5] KVM: selftests: Add bus lock exit test Manali Shukla
2025-05-16 19:28   ` Sean Christopherson
2025-05-19 11:45     ` Manali Shukla
2025-05-12  4:47 ` [PATCH v5 0/5] Add support for the Bus Lock Threshold Manali Shukla
2025-05-16 19:01   ` Sean Christopherson
2025-05-19 11:39     ` Manali Shukla
2025-05-20 16:48 ` Sean Christopherson

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:de90b104a0d dfblob:482092b2051 )
 OR (
bs:"Re: [PATCH v5 4/5] KVM: SVM: Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM CPUs" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aYINAAGPto_TqLt-@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=manali.shukla@amd.com \
    --cc=nikunj@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.