From: Sean Christopherson <seanjc@google.com>
To: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Chao Gao <chao.gao@intel.com>, Zeng Guang <guang.zeng@intel.com>,
Tom Lendacky <thomas.lendacky@amd.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Wanpeng Li <wanpengli@tencent.com>,
Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
"Luck, Tony" <tony.luck@intel.com>,
Kan Liang <kan.liang@linux.intel.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
"H. Peter Anvin" <hpa@zytor.com>,
Kim Phillips <kim.phillips@amd.com>,
Jarkko Sakkinen <jarkko@kernel.org>,
Jethro Beekman <jethro@fortanix.com>,
"Huang, Kai" <kai.huang@intel.com>,
"x86@kernel.org" <x86@kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Hu, Robert" <robert.hu@intel.com>
Subject: Re: [PATCH v5 7/8] KVM: VMX: Update PID-pointer table entry when APIC ID is changed
Date: Thu, 3 Feb 2022 20:22:13 +0000 [thread overview]
Message-ID: <Yfw5ddGNOnDqxMLs@google.com> (raw)
In-Reply-To: <YfsSjvnoQcfzdo68@google.com>
On Wed, Feb 02, 2022, Sean Christopherson wrote:
> On Thu, Jan 13, 2022, Sean Christopherson wrote:
> > On Tue, Jan 11, 2022, Maxim Levitsky wrote:
> > > Both Intel and AMD's PRM also state that changing APIC ID is implementation
> > > dependent.
> > >
> > > I vote to forbid changing apic id, at least in the case any APIC acceleration
> > > is used, be that APICv or AVIC.
> >
> > That has my vote as well. For IPIv in particular there's not much concern with
> > backwards compability, i.e. we can tie the behavior to enable_ipiv.
>
> Hrm, it may not be that simple. There's some crusty (really, really crusty) code
> in Linux's boot code that writes APIC_ID. IIUC, the intent is to play nice with
> running a UP crash dump kernel on "BSP" that isn't "the BSP", e.g. has a non-zero
> APIC ID.
...
> It's entirely possible that this path is unused in a KVM guest, but I don't think
> we can know that with 100% certainty.
>
> But I also completely agree that attempting to keep the tables up-to-date is ugly
> and a waste of time and effort, e.g. as Maxim pointed out, the current AVIC code
> is comically broken.
>
> Rather than disallowing the write, what if we add yet another inhibit that disables
> APICv if IPI virtualization is enabled and a vCPU has an APIC ID != vcpu_id? KVM
> is equipped to handle the emulation, so it just means that a guest that's doing
> weird things loses a big of performance.
LOL, this is all such a mess. The x2apic ID is actually indirectly writable on
AMD CPUs. Per the APM:
A value previously written by software to the 8-bit APIC_ID register (MMIO offset 30h) is
converted by hardware into the appropriate format and reflected into the 32-bit x2APIC_ID
register (MSR 802h).
I confirmed this on hardware (Milan). That means KVM's handling of the x2APIC ID
in kvm_lapic_set_base() is wrong, at least with respect to AMD.
Intel's SDM is a bit vague. I _think_ it means Intel CPUs treat the the x2APIC ID
and xAPIC ID as two completely independent assets. I haven't been able to glean any
info from hardware because writes to the legacy xAPIC ID are ignored on all CPUs
I've tested (Haswell and Cascade lake).
The x2APIC ID (32 bits) and the legacy local xAPIC ID (8 bits) are preserved
across this transition.
Given that the xAPIC ID appears to no longer be writable on Intel CPUs, it's
impossible that _generic_ kernel code can rely on xAPIC ID being writable. That
just leaves the aforementioned amd_numa_init() crud.
Linux's handling of that is:
void __init x86_numa_init(void)
{
if (!numa_off) {
#ifdef CONFIG_ACPI_NUMA
if (!numa_init(x86_acpi_numa_init))
return;
#endif
#ifdef CONFIG_AMD_NUMA
if (!numa_init(amd_numa_init))
return;
#endif
}
numa_init(dummy_numa_init);
}
i.e. ACPI_NUMA gets priority and thus amd_numa_init() will never be reached if
the NUMA topology is enumerated in the ACPI tables. Furthermore, the VMM would
have to actually emulate an old AMD northbridge, which is also extremely unlikely.
The odds of breaking a guest are further diminised given that KVM doesn't emulate
the xAPIC ID => x2APIC ID hilarity on AMD CPUs and no one has complained.
So, rather than tie this to IPI virtualization, I think we should either make
the xAPIC ID read-only across the board, or if we want to hedge in case someone
has a crazy use case, make the xAPIC ID read-only by default, add a module param
to let userspace opt-in to a writable xAPIC ID, and report x2APIC and APICv as
unsupported if the xAPIC ID is writable. E.g. rougly this, plus your AVIC patches
if we want to hedge.
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 28be02adc669..32854ac403a8 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -539,8 +539,15 @@ void kvm_set_cpu_caps(void)
0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
F(F16C) | F(RDRAND)
);
- /* KVM emulates x2apic in software irrespective of host support. */
- kvm_cpu_cap_set(X86_FEATURE_X2APIC);
+ /*
+ * KVM emulates x2apic in software irrespective of host support. Due
+ * to architecturally difference between Intel and AMD, x2APIC is not
+ * supported if the xAPIC ID is writable.
+ */
+ if (!xapic_id_writable)
+ kvm_cpu_cap_set(X86_FEATURE_X2APIC);
+ else
+ kvm_cpu_cap_clear(X86_FEATURE_X2APIC);
kvm_cpu_cap_mask(CPUID_1_EDX,
F(FPU) | F(VME) | F(DE) | F(PSE) |
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 670361bf1d81..6b42b65e7a42 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2047,10 +2047,10 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
switch (reg) {
case APIC_ID: /* Local APIC ID */
- if (!apic_x2apic_mode(apic))
+ if (apic_x2apic_mode(apic))
+ ret = 1;
+ else if (xapic_id_writable)
kvm_apic_set_xapic_id(apic, val >> 24);
- else
- ret = 1;
break;
case APIC_TASKPRI:
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9cef8e4598df..71a3bcdb3317 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4743,7 +4743,8 @@ static __init int svm_hardware_setup(void)
nrips = false;
}
- enable_apicv = avic = avic && npt_enabled && boot_cpu_has(X86_FEATURE_AVIC);
+ enable_apicv = avic = avic && !xapic_id_writable && npt_enabled &&
+ boot_cpu_has(X86_FEATURE_AVIC);
if (enable_apicv) {
pr_info("AVIC enabled\n");
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1b135473677b..fad7b36fbb1d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7939,7 +7939,7 @@ static __init int hardware_setup(void)
ple_window_shrink = 0;
}
- if (!cpu_has_vmx_apicv())
+ if (!cpu_has_vmx_apicv() || xapic_id_writable)
enable_apicv = 0;
if (!enable_apicv)
vmx_x86_ops.sync_pir_to_irr = NULL;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eaad2f485b64..ef6eba8c832a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -174,6 +174,10 @@ module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR);
static int __read_mostly lapic_timer_advance_ns = -1;
module_param(lapic_timer_advance_ns, int, S_IRUGO | S_IWUSR);
+bool __read_mostly xapic_id_writable;
+module_param(xapic_id_writable, bool, 0444);
+EXPORT_SYMBOL_GPL(xapic_id_writable);
+
static bool __read_mostly vector_hashing = true;
module_param(vector_hashing, bool, S_IRUGO);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 1ebd5a7594da..142663ff9cba 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -346,6 +346,7 @@ static inline bool kvm_mpx_supported(void)
extern unsigned int min_timer_period_us;
+extern bool xapic_id_writable;
extern bool enable_vmware_backdoor;
extern int pi_inject_timer;
next prev parent reply other threads:[~2022-02-03 20:22 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-31 14:28 [PATCH v5 0/8] IPI virtualization support for VM Zeng Guang
2021-12-31 14:28 ` [PATCH v5 1/8] x86/cpu: Add new VMX feature, Tertiary VM-Execution control Zeng Guang
2021-12-31 14:28 ` [PATCH v5 2/8] KVM: VMX: Extend BUILD_CONTROLS_SHADOW macro to support 64-bit variation Zeng Guang
2021-12-31 14:28 ` [PATCH v5 3/8] KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config Zeng Guang
2021-12-31 14:28 ` [PATCH v5 4/8] KVM: VMX: dump_vmcs() reports tertiary_exec_control field as well Zeng Guang
2022-01-13 21:03 ` Sean Christopherson
2022-01-14 4:19 ` Zeng Guang
2022-01-20 1:06 ` Sean Christopherson
2022-01-20 5:34 ` Zeng Guang
2021-12-31 14:28 ` [PATCH v5 5/8] KVM: x86: Support interrupt dispatch in x2APIC mode with APIC-write VM exit Zeng Guang
2022-01-13 21:29 ` Sean Christopherson
2022-01-14 7:52 ` Zeng Guang
2022-01-14 17:34 ` Sean Christopherson
2022-01-15 2:08 ` Zeng Guang
2022-01-18 0:44 ` Yuan Yao
2022-01-18 3:06 ` Zeng Guang
2022-01-18 18:17 ` Sean Christopherson
2022-01-19 2:48 ` Zeng Guang
2021-12-31 14:28 ` [PATCH v5 6/8] KVM: VMX: enable IPI virtualization Zeng Guang
2022-01-13 21:47 ` Sean Christopherson
2022-01-14 5:36 ` Zeng Guang
2021-12-31 14:28 ` [PATCH v5 7/8] KVM: VMX: Update PID-pointer table entry when APIC ID is changed Zeng Guang
2022-01-05 19:13 ` Tom Lendacky
2022-01-06 1:44 ` Zeng Guang
2022-01-06 14:06 ` Tom Lendacky
2022-01-07 8:05 ` Zeng Guang
2022-01-07 8:31 ` Maxim Levitsky
2022-01-10 7:45 ` Chao Gao
2022-01-10 22:24 ` Maxim Levitsky
2022-01-13 22:19 ` Sean Christopherson
2022-01-14 2:58 ` Chao Gao
2022-01-14 8:17 ` Maxim Levitsky
2022-01-17 3:17 ` Chao Gao
2022-02-02 23:23 ` Sean Christopherson
2022-02-03 20:22 ` Sean Christopherson [this message]
2022-02-23 6:10 ` Chao Gao
2022-02-23 10:26 ` Maxim Levitsky
2022-01-14 0:22 ` Yuan Yao
2021-12-31 14:28 ` [PATCH v5 8/8] KVM: VMX: Resize PID-ponter table on demand for IPI virtualization Zeng Guang
2022-01-13 22:09 ` Sean Christopherson
2022-01-14 15:59 ` Zeng Guang
2022-01-14 16:18 ` Sean Christopherson
2022-01-17 15:04 ` Zeng Guang
2022-01-18 17:15 ` Sean Christopherson
2022-01-19 7:55 ` Zeng Guang
2022-01-20 1:01 ` Sean Christopherson
2022-01-24 16:40 ` Zeng Guang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yfw5ddGNOnDqxMLs@google.com \
--to=seanjc@google.com \
--cc=bp@alien8.de \
--cc=chao.gao@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=guang.zeng@intel.com \
--cc=hpa@zytor.com \
--cc=jarkko@kernel.org \
--cc=jethro@fortanix.com \
--cc=jmattson@google.com \
--cc=joro@8bytes.org \
--cc=kai.huang@intel.com \
--cc=kan.liang@linux.intel.com \
--cc=kim.phillips@amd.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=mlevitsk@redhat.com \
--cc=pbonzini@redhat.com \
--cc=robert.hu@intel.com \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
--cc=tony.luck@intel.com \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).