Re: [PATCH] KVM: SVM: Fix SNP AP destroy race with VMRUN

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Tom Lendacky <thomas.lendacky@amd.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org,  Paolo Bonzini <pbonzini@redhat.com>,
	Borislav Petkov <bp@alien8.de>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	Ingo Molnar <mingo@redhat.com>,
	 Thomas Gleixner <tglx@linutronix.de>,
	Michael Roth <michael.roth@amd.com>
Subject: Re: [PATCH] KVM: SVM: Fix SNP AP destroy race with VMRUN
Date: Wed, 26 Mar 2025 10:17:45 -0700	[thread overview]
Message-ID: <Z-Q2uQ0perBQiZh-@google.com> (raw)
In-Reply-To: <41bfb025-008c-db03-2f6d-33b2d542ae65@amd.com>

On Wed, Mar 26, 2025, Tom Lendacky wrote:
> On 3/25/25 12:49, Tom Lendacky wrote:
> > On 3/21/25 18:17, Sean Christopherson wrote:
> >> On Fri, Mar 21, 2025, Tom Lendacky wrote:
> >>> On 3/18/25 08:47, Tom Lendacky wrote:
> >>>> On 3/18/25 07:43, Tom Lendacky wrote:
> >>>>>>> Very off-the-cuff, but I assume KVM_REQ_UPDATE_PROTECTED_GUEST_STATE just needs
> >>>>>>> to be annotated with KVM_REQUEST_WAIT.
> >>>>>>
> >>>>>> Ok, nice. I wasn't sure if KVM_REQUEST_WAIT would be appropriate here.
> >>>>>> This is much simpler. Let me test it out and resend if everything goes ok.
> >>>>>
> >>>>> So that doesn't work. I can still get an occasional #VMEXIT_INVALID. Let
> >>>>> me try to track down what is happening with this approach...
> >>>>
> >>>> Looks like I need to use kvm_make_vcpus_request_mask() instead of just a
> >>>> plain kvm_make_request() followed by a kvm_vcpu_kick().
> >>
> >> Ugh, I was going to say "you don't need to do that", but I forgot that
> >> kvm_vcpu_kick() subtly doesn't honor KVM_REQUEST_WAIT.
> >>
> >> Ooof, I'm 99% certain that's causing bugs elsewhere.  E.g. arm64's KVM_REQ_SLEEP
> >> uses the same "broken" pattern (LOL, which means that of course RISC-V does too).
> >> In quotes, because kvm_vcpu_kick() is the one that sucks.
> >>
> >> I would rather fix that a bit more directly and obviously.  IMO, converting to
> >> smp_call_function_single() isntead of bastardizing smp_send_reschedule() is worth
> >> doing regardless of the WAIT mess.  This will allow cleaning up a bunch of
> >> make_request+kick pairs, it'll just take a bit of care to make sure we don't
> >> create a WAIT where one isn't wanted (though those probably should have a big fat
> >> comment anyways).

...

> >> @@ -3764,12 +3764,12 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
> >>         if (kvm_arch_vcpu_should_kick(vcpu)) {
> >>                 cpu = READ_ONCE(vcpu->cpu);
> >>                 if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
> >> -                       smp_send_reschedule(cpu);
> >> +                       smp_call_function_single(cpu, ack_kick, NULL, wait);
> > 
> > In general, this approach works. However, this change triggered
> > 
> >  WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
> > 	      && !oops_in_progress);
> > 
> > in kernel/smp.c.

Drat, I forgot that smp_call_function_xxx() can deadlock even if wait=false due
to needing to take locks to set the callback function.

> Is keeping the old behavior desirable when IRQs are disabled? Something
> like:
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index a6fedcadd036..81cbc55eac3a 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3754,8 +3754,14 @@ void __kvm_vcpu_kick(struct kvm_vcpu *vcpu, bool wait)
>  	 */
>  	if (kvm_arch_vcpu_should_kick(vcpu)) {
>  		cpu = READ_ONCE(vcpu->cpu);
> -		if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
> -			smp_call_function_single(cpu, ack_kick, NULL, wait);
> +		if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu)) {
> +			WARN_ON_ONCE(wait && irqs_disabled());
> +
> +			if (irqs_disabled())
> +				smp_send_reschedule(cpu);
> +			else
> +				smp_call_function_single(cpu, ack_kick, NULL, wait);
> +		}
>  	}
>  out:
>  	put_cpu();

That, or keying off wait, and letting smp_call_function_xxx() yell about trying
to use it with IRQs disabled, i.e.

			if (wait)
				smp_call_function_single(cpu, ack_kick, NULL, wait);
			else
				smp_send_reschedule(cpu);

My vote would be for the checking "wait", so that the behavior is consistent for
a given request.

next prev parent reply	other threads:[~2025-03-26 17:17 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-17 17:20 [PATCH] KVM: SVM: Fix SNP AP destroy race with VMRUN Tom Lendacky
2025-03-17 17:23 ` Tom Lendacky
2025-03-17 17:28   ` Sean Christopherson
2025-03-17 17:36     ` Tom Lendacky
2025-03-18 12:43       ` Tom Lendacky
2025-03-18 13:47         ` Tom Lendacky
2025-03-21 16:52           ` Tom Lendacky
2025-03-21 23:17             ` Sean Christopherson
2025-03-25 17:49               ` Tom Lendacky
2025-03-26 15:34                 ` Tom Lendacky
2025-03-26 17:17                   ` Sean Christopherson [this message]
  -- strict thread matches above, loose matches on Subject: below --
2025-03-21 16:20 Tom Lendacky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z-Q2uQ0perBQiZh-@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.roth@amd.com \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.