public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Kai Huang <kai.huang@intel.com>
Cc: Chao Gao <chao.gao@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] KVM: VMX: Make setup_vmcs_config() preemption disabled
Date: Tue, 7 Mar 2023 09:17:55 -0800	[thread overview]
Message-ID: <ZAdxNgv0M6P63odE@google.com> (raw)
In-Reply-To: <3aec157afb6727e603d80c2232b3718033295f13.camel@intel.com>

On Thu, Mar 02, 2023, Huang, Kai wrote:
> On Thu, 2023-03-02 at 13:36 +0800, Gao, Chao wrote:
> > On Wed, Mar 01, 2023 at 11:54:38PM +1300, Kai Huang wrote:
> > > Make setup_vmcs_config() preemption disabled so it always performs on
> > > the same local cpu.
> > > 
> > > During module loading time, KVM intends to call setup_vmcs_config() to
> > > set up the global VMCS configurations on _one_ cpu in hardware_setup(),

That may have been the very original intention, but I don't think it has been the
true intention for a very long time.

> > > Change the existing setup_vmcs_config() to __setup_vmcs_config() and
> > > call the latter directly in the compatibility check code path.  Change
> > > setup_vmcs_config() to call __setup_vmcs_config() with preemption
> > > disabled so __setup_vmcs_config() is always done on the same cpu.
> > 
> > Maybe you can simply disable preemption in hardware_setup() although I
> > don't have a strong preference.
> > 
> > nested_vmx_setup_ctls_msrs() also reads some MSRs and sets up part of
> > vmcs_conf, should it be called on the same CPU as setup_vmcs_config()?
> 
> Yes I think so.  I missed this :)
> 
> Not sure whether there are other similar places too even outside of
> hardware_setup().
> 
> But compatibility check only checks things calculated via setup_vmcs_config()
> and nested_vmx_setup_ctls_msrs(), so I think it's fair to only put
> hardware_setup() inside preemption disabled.

Disabling preemption across hardware_setup() isn't feasible as there are a number
of allocations that might sleep.  But disabling preemption isn't necessary to
ensure setup runs on one CPU, that only requires disabling _migration_.  So _if_
we want to handle this in the kernel, we could simply do:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 541982de5762..9126fdf02649 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9470,7 +9470,9 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
        int r;
 
        mutex_lock(&vendor_module_lock);
+       migrate_disable();
        r = __kvm_x86_vendor_init(ops);
+       migrate_enable();
        mutex_unlock(&vendor_module_lock);
 
        return r;


But I'm not convinced we should handle this in the kernel.  Many of the checks,
especially in SVM, query boot_cpu_has(), not this_cpu_has(), i.e. to truly perform
setup on a single CPU, all of those would need to be converted to this_cpu_has().

Some of those boot_cpu_has() calls should be changed regardless of whether or not
migration is disabled, e.g. kvm_is_svm_supported() is arguably straight up buggy
due to cpu_has_svm() checking the boot CPU (I'll fix that by adding a patch after
open coding cpu_has_svm() into kvm_is_svm_supported()[*]).

But things like kvm_timer_init() should NOT be blindlgly converted to this_cpu_has(),
because the teardown path needs to mirror the setup path, e.g. if KVM ended up
running on frankenstein hardware where not all CPUs have a constant TSC, KVM could
leave a callback dangling and hose the kernel.  Obviously such hardware wouldn't
correctly run VMs, but crashing the kernel is a touch worse than KVM not working
correctly.

I'm not totally against converting to this_cpu_has() for the setup, as it would be
more intuitive in a lot of ways.  But, I don't think pinning the task actually
hardens KVM in a meaningful way.  If there are any divergences between CPUs, then
either KVM will notice before running VMs, e.g. the VMCS sanity checks, or KVM will
never notice, e.g. the myriad runtime paths that check boot_cpu_has() (or variants
thereof) without sanity checking across CPUs.  And if userspace _really_ wants to
have guarantees about how setup is performed, e.g. for repeatable, deterministic
behavior, then userspace should force loading of KVM to be done on CPU0.

So my vote is to leave things as-is (modulo the cpu_has_svm() mess).  But maybe add
documentation to explain the caveats about loading KVM, and how userspace can
mitigate those caveats?

[*] https://lore.kernel.org/all/20221201232655.290720-14-seanjc@google.com

  reply	other threads:[~2023-03-07 17:22 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-01 10:54 [PATCH] KVM: VMX: Make setup_vmcs_config() preemption disabled Kai Huang
2023-03-02  5:36 ` Chao Gao
2023-03-02  8:39   ` Huang, Kai
2023-03-07 17:17     ` Sean Christopherson [this message]
2023-03-08  1:20       ` Huang, Kai
2023-03-08 21:03         ` Sean Christopherson
2023-03-09  2:11           ` Huang, Kai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZAdxNgv0M6P63odE@google.com \
    --to=seanjc@google.com \
    --cc=chao.gao@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox