From: Sean Christopherson <seanjc@google.com>
To: Jon Kohler <jon@nutanix.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] KVM: X86: set EXITING_GUEST_MODE as soon as vCPU exits
Date: Thu, 1 Dec 2022 19:17:15 +0000 [thread overview]
Message-ID: <Y4j9u6YEpJ/px6kj@google.com> (raw)
In-Reply-To: <20221129182226.82087-1-jon@nutanix.com>
On Tue, Nov 29, 2022, Jon Kohler wrote:
> Set vcpu->mode to EXITING_GUEST_MODE as soon vCPU exits to reflect
> that we are indeed exiting guest mode, but not quite out of guest
> mode yet. Note: This is done lazily without an explicit memory
> barrier so that we do not regress the cost in the critical path
> of going from the exit to the exit handler.
This is not remotely sufficient justification. Memory "barriers" are just compiler
barriers on x86, so the odds of regressing the VM-Enter/VM-Exit cost without
introducing a bug are miniscule.
> Flip back to IN_GUEST_MODE for exits that use
> EXIT_FASTPATH_REENTER_GUEST, such that we are IN_GUEST_MODE upon
> reentry.
>
> Changing vcpu->mode away from IN_GUEST_MODE as early as possible
Except this isn't as early as possible. If we're going to bother doing something
like this, my vote is to move it into assembly.
> gives IPI senders as much runway as possible to avoid ringing
> doorbell or sending posted interrupt IPI in AMD and Intel,
> respectively. Since this is done without an explicit memory
> barrier, the worst case is that the IPI sender sees IN_GUEST_MODE
> still and sends a spurious event, which is the behavior prior
> to this patch.
No, worst case scenario is that kvm_vcpu_exiting_guest_mode() sees EXITING_GUEST_MODE
and doesn't kick the vCPU. For "kicks" that set a request, kvm_vcpu_exit_request()
will punt the vCPU out of the tight run loop, though there might be ordering issues.
But whether or not there are ordering issues is a moot point since there are uses
of kvm_vcpu_kick() that aren't accompanied by a request, e.g. to purge the PML
buffers. In other words, kvm_vcpu_kick() absolutely cannot have false negatives.
We could modify KVM to require a request when using kvm_vcpu_kick(), but that's
a bit of a hack, and all of the possible ordering problems is still a pile of
complexity I'd rather avoid.
No small part of me thinks we'd be better off adding a dedicated flag to very
precisely track whether or not the vCPU is truly "in the guest" for the purposes
of sending IPIs. Things like kicks have different requirements around IN_GUEST_MODE
than sending interrupts, e.g. KVM manually processes the IRR on every VM-Enter and
so lack of an IPI is a non-issue, whereas missing an IPI for a kick is problematic.
In other words, EXITING_GUEST_MODE really needs to mean "existing the run loop".
E.g. toggle the dedicated flag within a few instructions of VM-Enter and
VM-Exit for maximum efficiency for interrupts, and avoid having to make vcpu->mode
more complex than it already is.
To add clarity, we could even rename IN_GUEST_MODE and EXITING_GUEST_MODE to
something like IN_RUN_LOOP and EXITING_RUN_LOOP.
> Signed-off-by: Jon Kohler <jon@nutanix.com>
> ---
> arch/x86/kvm/svm/svm.c | 7 +++++++
> arch/x86/kvm/vmx/vmx.c | 23 +++++++++++++++++++++++
> arch/x86/kvm/x86.c | 8 ++++++++
> 3 files changed, 38 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index ce362e88a567..5f0c118a3ffd 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3907,6 +3907,13 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in
> else
> __svm_vcpu_run(svm, spec_ctrl_intercepted);
>
> + /* Optimize IPI reduction by setting mode immediately after vmexit
/*
* Because KVM isn't the crazy land of net/ block comments should like
* this.
*/
> + * without a memmory barrier as this as not paired anywhere.
> + * is will be set to OUTSIDE_GUEST_MODE in x86 common code with a memory
> + * barrier, after the host is done fully restoring various host states.
> + */
> + vcpu->mode = EXITING_GUEST_MODE;
> +
> guest_state_exit_irqoff();
> }
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 63247c57c72c..243dcb87c727 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -5878,6 +5878,17 @@ static fastpath_t handle_fastpath_preemption_timer(struct kvm_vcpu *vcpu)
>
> if (!vmx->req_immediate_exit &&
> !unlikely(vmx->loaded_vmcs->hv_timer_soft_disabled)) {
> + /* Reset IN_GUEST_MODE since we're going to reenter
> + * guest as part of this fast path. This is done as
> + * an optimization without a memory barrier since
> + * EXITING_GUEST_MODE is also set without a memory
Heh, justifying the lack of a memory barrier by saying pointing out the other
code you added doesn't use a memory barrier is interesting, to put it politely.
> + * barrier. This also needs to be reset prior to
> + * calling apic_timer_expired() so that
> + * kvm_use_posted_timer_interrupt() returns the proper
> + * value.
> + */
> + if (vcpu->mode == EXITING_GUEST_MODE)
> + vcpu->mode = IN_GUEST_MODE;
It's far easier, likely more performant, documents why this has a chance of working,
and significantly less error prone to do this unconditionally in either assembly
or after the EXIT_FASTPATH_REENTER_GUEST check in vcpu_enter_guest().
next prev parent reply other threads:[~2022-12-01 19:17 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-29 18:22 [PATCH] KVM: X86: set EXITING_GUEST_MODE as soon as vCPU exits Jon Kohler
2022-11-29 19:47 ` Maxim Levitsky
2022-11-29 19:56 ` Jon Kohler
2022-12-01 15:45 ` Maxim Levitsky
2022-11-30 6:29 ` Chao Gao
2022-11-30 14:07 ` Jon Kohler
2022-12-01 4:55 ` Chao Gao
2022-12-01 14:57 ` Jon Kohler
2022-12-01 19:17 ` Sean Christopherson [this message]
2022-12-05 15:09 ` Jon Kohler
2022-12-07 20:18 ` Sean Christopherson
2022-12-09 8:06 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y4j9u6YEpJ/px6kj@google.com \
--to=seanjc@google.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=jon@nutanix.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox