* [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX
@ 2016-07-06 13:43 Paolo Bonzini
2016-07-06 14:18 ` Borislav Petkov
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-06 13:43 UTC (permalink / raw)
To: linux-kernel, kvm; +Cc: stable, Borislav Petkov
I don't know what I was thinking when I wrote commit 46896c73c1a4 ("KVM:
svm: add support for RDTSCP", 2015-11-12); I missed write_rdtscp_aux which
obviously uses MSR_TSC_AUX.
Therefore we do need to save/restore MSR_TSC_AUX in svm_vcpu_run.
Cc: stable@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/svm.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 16ef31b87452..44f6368f8b45 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -43,6 +43,7 @@
#include <asm/kvm_para.h>
#include <asm/virtext.h>
+#include <asm/vgtod.h>
#include "trace.h"
#define __ex(x) __kvm_handle_fault_on_reboot(x)
@@ -1530,9 +1531,6 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
wrmsrl(MSR_AMD64_TSC_RATIO, tsc_ratio);
}
}
- /* This assumes that the kernel never uses MSR_TSC_AUX */
- if (static_cpu_has(X86_FEATURE_RDTSCP))
- wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
avic_vcpu_load(vcpu, cpu);
}
@@ -4474,6 +4472,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
svm->vmcb->save.cr2 = vcpu->arch.cr2;
clgi();
+ if (static_cpu_has(X86_FEATURE_RDTSCP))
+ wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
local_irq_enable();
@@ -4550,6 +4550,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
#endif
);
+ if (static_cpu_has(X86_FEATURE_RDTSCP))
+ wrmsrl(MSR_TSC_AUX, __getcpu());
#ifdef CONFIG_X86_64
wrmsrl(MSR_GS_BASE, svm->host.gs_base);
#else
--
1.8.3.1
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX 2016-07-06 13:43 [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX Paolo Bonzini @ 2016-07-06 14:18 ` Borislav Petkov 2016-07-06 14:29 ` Paolo Bonzini 2016-07-06 15:00 ` kbuild test robot 2016-07-15 12:15 ` Radim Krčmář 2 siblings, 1 reply; 20+ messages in thread From: Borislav Petkov @ 2016-07-06 14:18 UTC (permalink / raw) To: Paolo Bonzini; +Cc: linux-kernel, kvm, stable On Wed, Jul 06, 2016 at 03:43:16PM +0200, Paolo Bonzini wrote: > I don't know what I was thinking when I wrote commit 46896c73c1a4 ("KVM: > svm: add support for RDTSCP", 2015-11-12); I missed write_rdtscp_aux which > obviously uses MSR_TSC_AUX. > > Therefore we do need to save/restore MSR_TSC_AUX in svm_vcpu_run. > > Cc: stable@vger.kernel.org > Cc: Borislav Petkov <bp@alien8.de> > Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP") > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Something's still missing. I have a small program which does RDTSCP in the guest: $ taskset -c 3 ./rdtscp aux1: 0x0 aux2: 0x0 p1: 195514968442, p2: 195515255582, 287140 and the aux things which are %ecx, are 0 (should be 3 in that case). It did work with my patch with the RDTSCP intercept: $ taskset -c 3 ./rdtscp aux1: 0x3 aux2: 0x3 p1: 157117003683, p2: 157119280794, 2277111 Btw, just for my own understanding: if we don't intercept RDTSCP, does it get emulated? Where does the TSC value come from, qemu? Here's the program. --- #include <stdio.h> #include <stdlib.h> #include <unistd.h> typedef unsigned long long u64; #define DECLARE_ARGS(val, low, high) unsigned low, high #define EAX_EDX_VAL(val, low, high) ((low) | ((u64)(high) << 32)) #define EAX_EDX_ARGS(val, low, high) "a" (low), "d" (high) #define EAX_EDX_RET(val, low, high) "=a" (low), "=d" (high) static __always_inline unsigned long long rdtscp(unsigned int *aux) { unsigned int lo, hi; asm volatile("rdtscp" : "=a" (lo), "=d" (hi), "=c" (*aux)); return EAX_EDX_VAL(0, lo, hi); } int main() { unsigned long long p1, p2; unsigned int aux; p1 = rdtscp(&aux); printf("aux1: 0x%x\n", aux); p2 = rdtscp(&aux); printf("aux2: 0x%x\n", aux); printf("p1: %llu, p2: %llu, %lld\n", p1, p2, p2 - p1); return 0; } -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX 2016-07-06 14:18 ` Borislav Petkov @ 2016-07-06 14:29 ` Paolo Bonzini 2016-07-07 10:41 ` Borislav Petkov 0 siblings, 1 reply; 20+ messages in thread From: Paolo Bonzini @ 2016-07-06 14:29 UTC (permalink / raw) To: Borislav Petkov; +Cc: linux-kernel, kvm, stable On 06/07/2016 16:18, Borislav Petkov wrote: > Something's still missing. I have a small program which does RDTSCP in > the guest: > > $ taskset -c 3 ./rdtscp > aux1: 0x0 > aux2: 0x0 > p1: 195514968442, p2: 195515255582, 287140 > > and the aux things which are %ecx, are 0 (should be 3 in that case). Ok, I'll take a look at it tomorrow. Can you test this in the meanwhile: git clone git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git cd kvm-unit-tests ./configure make ./x86/run x86/tsc.flat -cpu kvm64,+rdtscp On Intel I see: enabling apic rdtsc latency 18 rdtsc after wrtsc(0): 727124155 rdtsc after wrtsc(100000000000): 100000001759 PASS: Test RDTSCP 0 PASS: Test RDTSCP 10 PASS: Test RDTSCP 256 SUMMARY: 3 tests > It did work with my patch with the RDTSCP intercept: > > $ taskset -c 3 ./rdtscp > aux1: 0x3 > aux2: 0x3 > p1: 157117003683, p2: 157119280794, 2277111 > > Btw, just for my own understanding: if we don't intercept RDTSCP, does > it get emulated? Where does the TSC value come from, qemu? It comes from the processor's TSC + the TSC offset field in the VMCB. Paolo > > Here's the program. > > --- > > #include <stdio.h> > #include <stdlib.h> > #include <unistd.h> > > typedef unsigned long long u64; > > #define DECLARE_ARGS(val, low, high) unsigned low, high > #define EAX_EDX_VAL(val, low, high) ((low) | ((u64)(high) << 32)) > #define EAX_EDX_ARGS(val, low, high) "a" (low), "d" (high) > #define EAX_EDX_RET(val, low, high) "=a" (low), "=d" (high) > > static __always_inline unsigned long long rdtscp(unsigned int *aux) > { > unsigned int lo, hi; > > asm volatile("rdtscp" : "=a" (lo), "=d" (hi), "=c" (*aux)); > > return EAX_EDX_VAL(0, lo, hi); > } > > int main() > { > unsigned long long p1, p2; > unsigned int aux; > > p1 = rdtscp(&aux); > printf("aux1: 0x%x\n", aux); > p2 = rdtscp(&aux); > printf("aux2: 0x%x\n", aux); > > printf("p1: %llu, p2: %llu, %lld\n", p1, p2, p2 - p1); > > return 0; > } > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX 2016-07-06 14:29 ` Paolo Bonzini @ 2016-07-07 10:41 ` Borislav Petkov 2016-07-07 11:01 ` Paolo Bonzini 0 siblings, 1 reply; 20+ messages in thread From: Borislav Petkov @ 2016-07-07 10:41 UTC (permalink / raw) To: Paolo Bonzini, Eduardo Habkost; +Cc: linux-kernel, kvm, stable On Wed, Jul 06, 2016 at 04:29:35PM +0200, Paolo Bonzini wrote: > Can you test this in the meanwhile: > > git clone git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git > cd kvm-unit-tests > ./configure > make > ./x86/run x86/tsc.flat -cpu kvm64,+rdtscp > > On Intel I see: > > enabling apic > rdtsc latency 18 > rdtsc after wrtsc(0): 727124155 > rdtsc after wrtsc(100000000000): 100000001759 > PASS: Test RDTSCP 0 > PASS: Test RDTSCP 10 > PASS: Test RDTSCP 256 > SUMMARY: 3 tests Ok, found it: I need to start the guest with "+rdtscp", see below. Which begs the question: can we readd CPUID_EXT2_RDTSCP to the Opteron_* models as in the second diff here: https://lkml.kernel.org/r/20160706124438.GB7300@pd.tnic ? Or are we still afraid of "host doesn't support requested feature" messages from: 33b5e8c03ae7 ("target-i386: Disable rdtscp on Opteron_G* CPU models") ? $ ./x86/run x86/tsc.flat -cpu kvm64,+rdtscp qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/tsc.flat -cpu kvm64,+rdtscp enabling apic rdtsc latency 68 rdtsc after wrtsc(0): 988590164 rdtsc after wrtsc(100000000000): 100000002807 PASS: Test RDTSCP 0 PASS: Test RDTSCP 10 PASS: Test RDTSCP 256 SUMMARY: 3 tests latest qemu: $ QEMU=/root/src/qemu/qemu.git/x86_64-softmmu/qemu-system-x86_64 ./x86/run x86/tsc.flat -cpu kvm64,+rdtscp /root/src/qemu/qemu.git/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/tsc.flat -cpu kvm64,+rdtscp enabling apic rdtsc latency 69 rdtsc after wrtsc(0): 715310314 rdtsc after wrtsc(100000000000): 100000002434 PASS: Test RDTSCP 0 PASS: Test RDTSCP 10 PASS: Test RDTSCP 256 SUMMARY: 3 tests guest booted *without* "+rdtscp": taskset -c 3 ./rdtscp aux1: 0x0 aux2: 0x0 p1: 284683839780, p2: 284684080314, 240534 with "+rdtscp": $ taskset -c 3 ./rdtscp aux1: 0x3 aux2: 0x3 p1: 168907589830, p2: 168907856068, 266238 $ taskset -c 2 ./rdtscp aux1: 0x2 aux2: 0x2 p1: 176251144121, p2: 176251427089, 282968 Ok, all good. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX 2016-07-07 10:41 ` Borislav Petkov @ 2016-07-07 11:01 ` Paolo Bonzini 2016-07-07 11:47 ` Borislav Petkov 0 siblings, 1 reply; 20+ messages in thread From: Paolo Bonzini @ 2016-07-07 11:01 UTC (permalink / raw) To: Borislav Petkov, Eduardo Habkost; +Cc: linux-kernel, kvm, stable On 07/07/2016 12:41, Borislav Petkov wrote: > On Wed, Jul 06, 2016 at 04:29:35PM +0200, Paolo Bonzini wrote: >> Can you test this in the meanwhile: >> >> git clone git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git >> cd kvm-unit-tests >> ./configure >> make >> ./x86/run x86/tsc.flat -cpu kvm64,+rdtscp >> >> On Intel I see: >> >> enabling apic >> rdtsc latency 18 >> rdtsc after wrtsc(0): 727124155 >> rdtsc after wrtsc(100000000000): 100000001759 >> PASS: Test RDTSCP 0 >> PASS: Test RDTSCP 10 >> PASS: Test RDTSCP 256 >> SUMMARY: 3 tests > > Ok, found it: I need to start the guest with "+rdtscp", see below. Yeah, I thought you were using your own QEMU patch which adds it to Opteron CPU models. > Which begs the question: can we readd CPUID_EXT2_RDTSCP to the Opteron_* > models as in the second diff here: We'll probably add -cpu Opteron_G2_rdtscp but we will indeed do something like that. Paolo > Or are we still afraid of "host doesn't support requested feature" > messages from: > > 33b5e8c03ae7 ("target-i386: Disable rdtscp on Opteron_G* CPU models") > > ? > > $ ./x86/run x86/tsc.flat -cpu kvm64,+rdtscp > qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/tsc.flat -cpu kvm64,+rdtscp > enabling apic > rdtsc latency 68 > rdtsc after wrtsc(0): 988590164 > rdtsc after wrtsc(100000000000): 100000002807 > PASS: Test RDTSCP 0 > PASS: Test RDTSCP 10 > PASS: Test RDTSCP 256 > > SUMMARY: 3 tests > > latest qemu: > > $ QEMU=/root/src/qemu/qemu.git/x86_64-softmmu/qemu-system-x86_64 ./x86/run x86/tsc.flat -cpu kvm64,+rdtscp > /root/src/qemu/qemu.git/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/tsc.flat -cpu kvm64,+rdtscp > enabling apic > rdtsc latency 69 > rdtsc after wrtsc(0): 715310314 > rdtsc after wrtsc(100000000000): 100000002434 > PASS: Test RDTSCP 0 > PASS: Test RDTSCP 10 > PASS: Test RDTSCP 256 > > SUMMARY: 3 tests > > guest booted *without* "+rdtscp": > > taskset -c 3 ./rdtscp > aux1: 0x0 > aux2: 0x0 > p1: 284683839780, p2: 284684080314, 240534 > > with "+rdtscp": > > $ taskset -c 3 ./rdtscp > aux1: 0x3 > aux2: 0x3 > p1: 168907589830, p2: 168907856068, 266238 > $ taskset -c 2 ./rdtscp > aux1: 0x2 > aux2: 0x2 > p1: 176251144121, p2: 176251427089, 282968 > > Ok, all good. > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX 2016-07-07 11:01 ` Paolo Bonzini @ 2016-07-07 11:47 ` Borislav Petkov 2016-07-07 12:28 ` Paolo Bonzini 0 siblings, 1 reply; 20+ messages in thread From: Borislav Petkov @ 2016-07-07 11:47 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Eduardo Habkost, linux-kernel, kvm, stable On Thu, Jul 07, 2016 at 01:01:34PM +0200, Paolo Bonzini wrote: > We'll probably add -cpu Opteron_G2_rdtscp but we will indeed do > something like that. Why a separate CPU model and not change the existing ones? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX 2016-07-07 11:47 ` Borislav Petkov @ 2016-07-07 12:28 ` Paolo Bonzini 2016-07-07 12:47 ` Borislav Petkov 0 siblings, 1 reply; 20+ messages in thread From: Paolo Bonzini @ 2016-07-07 12:28 UTC (permalink / raw) To: Borislav Petkov; +Cc: Eduardo Habkost, linux-kernel, kvm, stable On 07/07/2016 13:47, Borislav Petkov wrote: > On Thu, Jul 07, 2016 at 01:01:34PM +0200, Paolo Bonzini wrote: >> We'll probably add -cpu Opteron_G2_rdtscp but we will indeed do >> something like that. > > Why a separate CPU model and not change the existing ones? Because otherwise you couldn't do live migration from new QEMU + new kernel to new QEMU + old kernel. QEMU tries to avoid requiring lockstep upgrades of QEMU and KVM (unlike for example perf). Paolo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX 2016-07-07 12:28 ` Paolo Bonzini @ 2016-07-07 12:47 ` Borislav Petkov 2016-07-07 13:16 ` Paolo Bonzini 0 siblings, 1 reply; 20+ messages in thread From: Borislav Petkov @ 2016-07-07 12:47 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Eduardo Habkost, linux-kernel, kvm, stable On Thu, Jul 07, 2016 at 02:28:29PM +0200, Paolo Bonzini wrote: > Because otherwise you couldn't do live migration from new QEMU + new > kernel to new QEMU + old kernel. QEMU tries to avoid requiring lockstep > upgrades of QEMU and KVM (unlike for example perf). Hmm, ok. About that - and I've asked about it a couple of times already - how would you guys feel about a testing feature to qemu - something I'd love to have with which I can set arbitrary CPUID bits for testing kernels? I.e., something like that: qemu ... -cpu=Opteron_G5,cpuid_leaf=<bla>,eax=<..>,ebx=<...>, ...,filter=off The filter=off thing is to disable the checking in x86_cpu_filter_features() so that those arbitrary CPUID leafs are actually simulated to the guest. Would something like that make sense for upstream or should I hack it in locally only? Because it sure does help a lot when testing kernel features for unreleased CPUs but for which the code is already being submitted. And with a qemu feature like that, we could at least smoke-test those a bit. Hmmm? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX 2016-07-07 12:47 ` Borislav Petkov @ 2016-07-07 13:16 ` Paolo Bonzini 2016-07-07 16:01 ` Borislav Petkov 0 siblings, 1 reply; 20+ messages in thread From: Paolo Bonzini @ 2016-07-07 13:16 UTC (permalink / raw) To: Borislav Petkov; +Cc: Eduardo Habkost, linux-kernel, kvm, stable On 07/07/2016 14:47, Borislav Petkov wrote: > On Thu, Jul 07, 2016 at 02:28:29PM +0200, Paolo Bonzini wrote: >> Because otherwise you couldn't do live migration from new QEMU + new >> kernel to new QEMU + old kernel. QEMU tries to avoid requiring lockstep >> upgrades of QEMU and KVM (unlike for example perf). > > Hmm, ok. > > About that - and I've asked about it a couple of times already - how > would you guys feel about a testing feature to qemu - something I'd love > to have with which I can set arbitrary CPUID bits for testing kernels? Eduardo is the one to answer, but usually we add features to QEMU before the processors are released (typically as soon as KVM supports them). So with a new enough QEMU this in theory should not be necessary. Adding a new feature that's not in a CPU model and that's not associated to new state is really trivial: commit f7fda280948a5e74aeb076ef346b991ecb173c56 Author: Xiao Guangrong <guangrong.xiao@linux.intel.com> Date: Thu Oct 29 15:31:39 2015 +0800 target-i386: Enable clflushopt/clwb/pcommit instructions These instructions are used by NVDIMM drivers and the specification is located at: https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf There instructions are available on Skylake Server. Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 090d1d8..0d080c1 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -259,8 +259,8 @@ static const char *svm_feature_name[] = { static const char *cpuid_7_0_ebx_feature_name[] = { "fsgsbase", "tsc_adjust", NULL, "bmi1", "hle", "avx2", NULL, "smep", "bmi2", "erms", "invpcid", "rtm", NULL, NULL, "mpx", NULL, - "avx512f", NULL, "rdseed", "adx", "smap", NULL, NULL, NULL, - NULL, NULL, "avx512pf", "avx512er", "avx512cd", NULL, NULL, NULL, + "avx512f", NULL, "rdseed", "adx", "smap", NULL, "pcommit", "clflushopt", + "clwb", NULL, "avx512pf", "avx512er", "avx512cd", NULL, NULL, NULL, }; Paolo > I.e., something like that: > > qemu ... -cpu=Opteron_G5,cpuid_leaf=<bla>,eax=<..>,ebx=<...>, ...,filter=off > > The filter=off thing is to disable the checking in > x86_cpu_filter_features() so that those arbitrary CPUID leafs are > actually simulated to the guest. > > Would something like that make sense for upstream or should I hack it in > locally only? > > Because it sure does help a lot when testing kernel features for > unreleased CPUs but for which the code is already being submitted. And > with a qemu feature like that, we could at least smoke-test those a bit. > > Hmmm? > ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX 2016-07-07 13:16 ` Paolo Bonzini @ 2016-07-07 16:01 ` Borislav Petkov 2016-07-07 16:17 ` Paolo Bonzini 2016-07-07 16:27 ` Eduardo Habkost 0 siblings, 2 replies; 20+ messages in thread From: Borislav Petkov @ 2016-07-07 16:01 UTC (permalink / raw) To: Paolo Bonzini Cc: Eduardo Habkost, linux-kernel, kvm, stable, Yazen Ghannam, Brijesh Singh On Thu, Jul 07, 2016 at 03:16:21PM +0200, Paolo Bonzini wrote: > Eduardo is the one to answer, but usually we add features to QEMU > before the processors are released (typically as soon as KVM supports > them). So with a new enough QEMU this in theory should not be > necessary. > > Adding a new feature that's not in a CPU model and that's not > associated to new state is really trivial: Cool. Btw, how about something like this? Specifically, I'd like to test RAS features on the new upcoming AMD Zen CPU and I've defined one from the stuff we know so far from kernel patches. The "filter=off" thing I've added in case I want to disable x86_cpu_filter_features() but it works just fine without it when I boot with -cpu Zen. So I can remove it too. Would something like that be acceptable? We can continue improving on this as features become known and even implement some functionality in qemu/kvm as time allows. --- From: Borislav Petkov <bp@suse.de> Date: Tue, 5 Jul 2016 16:12:18 +0200 Subject: [PATCH] Zen emu: first working version Boot with "-c Zen,filter=off" to disable CPUID bits filtering. Signed-off-by: Borislav Petkov <bp@suse.de> --- target-i386/cpu.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- target-i386/cpu.h | 7 +++++++ 2 files changed, 66 insertions(+), 1 deletion(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 3bd3cfc3ad16..cc9c97457387 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -307,6 +307,17 @@ static const char *cpuid_6_feature_name[] = { NULL, NULL, NULL, NULL, }; +static const char *smca_feature_name[] = { + "overflow_recov", "succor", NULL, "smca", + NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, +}; + #define I486_FEATURES (CPUID_FP87 | CPUID_VME | CPUID_PSE) #define PENTIUM_FEATURES (I486_FEATURES | CPUID_DE | CPUID_TSC | \ CPUID_MSR | CPUID_MCE | CPUID_CX8 | CPUID_MMX | CPUID_APIC) @@ -449,6 +460,11 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = { .cpuid_eax = 6, .cpuid_reg = R_EAX, .tcg_features = TCG_6_EAX_FEATURES, }, + [FEAT_8000_0007_EBX] = { + .feat_names = smca_feature_name, + .cpuid_eax = 0x80000007, + .cpuid_reg = R_EBX, + }, }; typedef struct X86RegisterInfo32 { @@ -1449,6 +1465,44 @@ static X86CPUDefinition builtin_x86_defs[] = { .xlevel = 0x8000001A, .model_id = "AMD Opteron 63xx class CPU", }, + { + .name = "Zen", + .level = 0xd, + .vendor = CPUID_VENDOR_AMD, + .family = 23, + .model = 0, + .stepping = 0, + .features[FEAT_1_EDX] = + CPUID_VME | CPUID_SSE2 | CPUID_SSE | CPUID_FXSR | CPUID_MMX | + CPUID_CLFLUSH | CPUID_PSE36 | CPUID_PAT | CPUID_CMOV | CPUID_MCA | + CPUID_PGE | CPUID_MTRR | CPUID_SEP | CPUID_APIC | CPUID_CX8 | + CPUID_MCE | CPUID_PAE | CPUID_MSR | CPUID_TSC | CPUID_PSE | + CPUID_DE | CPUID_FP87, + .features[FEAT_1_ECX] = + CPUID_EXT_F16C | CPUID_EXT_AVX | CPUID_EXT_XSAVE | + CPUID_EXT_AES | CPUID_EXT_POPCNT | CPUID_EXT_SSE42 | + CPUID_EXT_SSE41 | CPUID_EXT_CX16 | CPUID_EXT_FMA | + CPUID_EXT_SSSE3 | CPUID_EXT_PCLMULQDQ | CPUID_EXT_SSE3, + .features[FEAT_8000_0001_EDX] = + CPUID_EXT2_LM | CPUID_EXT2_RDTSCP | + CPUID_EXT2_PDPE1GB | CPUID_EXT2_FXSR | CPUID_EXT2_MMX | + CPUID_EXT2_NX | CPUID_EXT2_PSE36 | CPUID_EXT2_PAT | + CPUID_EXT2_CMOV | CPUID_EXT2_MCA | CPUID_EXT2_PGE | + CPUID_EXT2_MTRR | CPUID_EXT2_SYSCALL | CPUID_EXT2_APIC | + CPUID_EXT2_CX8 | CPUID_EXT2_MCE | CPUID_EXT2_PAE | CPUID_EXT2_MSR | + CPUID_EXT2_TSC | CPUID_EXT2_PSE | CPUID_EXT2_DE | CPUID_EXT2_FPU, + .features[FEAT_8000_0001_ECX] = + CPUID_EXT3_TBM | CPUID_EXT3_FMA4 | CPUID_EXT3_XOP | + CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_MISALIGNSSE | + CPUID_EXT3_SSE4A | CPUID_EXT3_ABM | CPUID_EXT3_SVM | + CPUID_EXT3_LAHF_LM, + /* no xsaveopt! */ + .features[FEAT_8000_0007_EBX] = + CPUID_OVERFLOW_RECOV | CPUID_SUCCOR | CPUID_SMCA, + .xlevel = 0x8000001A, + .model_id = "AMD Zen CPU", + }, + }; typedef struct PropValue { @@ -2118,6 +2172,9 @@ static int x86_cpu_filter_features(X86CPU *cpu) FeatureWord w; int rv = 0; + if (!cpu->filter_cpuid) + return 0; + for (w = 0; w < FEATURE_WORDS; w++) { uint32_t host_feat = x86_cpu_get_supported_feature_word(w, cpu->migratable); @@ -2596,7 +2653,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, break; case 0x80000007: *eax = 0; - *ebx = 0; + *ebx = env->features[FEAT_8000_0007_EBX]; *ecx = 0; *edx = env->features[FEAT_8000_0007_EDX]; break; @@ -3256,6 +3313,7 @@ static Property x86_cpu_properties[] = { DEFINE_PROP_BOOL("hv-stimer", X86CPU, hyperv_stimer, false), DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true), DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false), + DEFINE_PROP_BOOL("filter", X86CPU, filter_cpuid, false), DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true), DEFINE_PROP_UINT32("level", X86CPU, env.cpuid_level, 0), DEFINE_PROP_UINT32("xlevel", X86CPU, env.cpuid_xlevel, 0), diff --git a/target-i386/cpu.h b/target-i386/cpu.h index 474b0b937d71..258c1b261cd2 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -443,6 +443,7 @@ typedef enum FeatureWord { FEAT_SVM, /* CPUID[8000_000A].EDX */ FEAT_XSAVE, /* CPUID[EAX=0xd,ECX=1].EAX */ FEAT_6_EAX, /* CPUID[6].EAX */ + FEAT_8000_0007_EBX, /* CPUID[8000_0007].EBX */ FEATURE_WORDS, } FeatureWord; @@ -620,6 +621,11 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS]; #define CPUID_APM_INVTSC (1U << 8) #define CPUID_VENDOR_SZ 12 +/* CPUID[0x80000007].EBX flags: */ +#define CPUID_OVERFLOW_RECOV (1U << 0) /* MCA overflow recovery support */ +#define CPUID_SUCCOR (1U << 1) /* Uncorrectable error containment and recovery */ +#define CPUID_SMCA (1U << 3) /* Scalable MCA */ + #define CPUID_VENDOR_INTEL_1 0x756e6547 /* "Genu" */ #define CPUID_VENDOR_INTEL_2 0x49656e69 /* "ineI" */ @@ -1160,6 +1166,7 @@ struct X86CPU { bool hyperv_stimer; bool check_cpuid; bool enforce_cpuid; + bool filter_cpuid; bool expose_kvm; bool migratable; bool host_features; -- 2.7.3 -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX 2016-07-07 16:01 ` Borislav Petkov @ 2016-07-07 16:17 ` Paolo Bonzini 2016-07-07 16:27 ` Eduardo Habkost 1 sibling, 0 replies; 20+ messages in thread From: Paolo Bonzini @ 2016-07-07 16:17 UTC (permalink / raw) To: Borislav Petkov Cc: Eduardo Habkost, linux-kernel, kvm, stable, Yazen Ghannam, Brijesh Singh > On Thu, Jul 07, 2016 at 03:16:21PM +0200, Paolo Bonzini wrote: > > Eduardo is the one to answer, but usually we add features to QEMU > > before the processors are released (typically as soon as KVM supports > > them). So with a new enough QEMU this in theory should not be > > necessary. > > > > Adding a new feature that's not in a CPU model and that's not > > associated to new state is really trivial: > > Cool. > > Btw, how about something like this? > > Specifically, I'd like to test RAS features on the new upcoming AMD > Zen CPU and I've defined one from the stuff we know so far from kernel > patches. It looks good from skimming it---but again this isn't quite my territory. Paolo > The "filter=off" thing I've added in case I want to disable > x86_cpu_filter_features() but it works just fine without it when I boot > with -cpu Zen. So I can remove it too. > > Would something like that be acceptable? > > We can continue improving on this as features become known and even > implement some functionality in qemu/kvm as time allows. > > --- > From: Borislav Petkov <bp@suse.de> > Date: Tue, 5 Jul 2016 16:12:18 +0200 > Subject: [PATCH] Zen emu: first working version > > Boot with "-c Zen,filter=off" to disable CPUID bits filtering. > > Signed-off-by: Borislav Petkov <bp@suse.de> > --- > target-i386/cpu.c | 60 > ++++++++++++++++++++++++++++++++++++++++++++++++++++++- > target-i386/cpu.h | 7 +++++++ > 2 files changed, 66 insertions(+), 1 deletion(-) > > diff --git a/target-i386/cpu.c b/target-i386/cpu.c > index 3bd3cfc3ad16..cc9c97457387 100644 > --- a/target-i386/cpu.c > +++ b/target-i386/cpu.c > @@ -307,6 +307,17 @@ static const char *cpuid_6_feature_name[] = { > NULL, NULL, NULL, NULL, > }; > > +static const char *smca_feature_name[] = { > + "overflow_recov", "succor", NULL, "smca", > + NULL, NULL, NULL, NULL, > + NULL, NULL, NULL, NULL, > + NULL, NULL, NULL, NULL, > + NULL, NULL, NULL, NULL, > + NULL, NULL, NULL, NULL, > + NULL, NULL, NULL, NULL, > + NULL, NULL, NULL, NULL, > +}; > + > #define I486_FEATURES (CPUID_FP87 | CPUID_VME | CPUID_PSE) > #define PENTIUM_FEATURES (I486_FEATURES | CPUID_DE | CPUID_TSC | \ > CPUID_MSR | CPUID_MCE | CPUID_CX8 | CPUID_MMX | CPUID_APIC) > @@ -449,6 +460,11 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] > = { > .cpuid_eax = 6, .cpuid_reg = R_EAX, > .tcg_features = TCG_6_EAX_FEATURES, > }, > + [FEAT_8000_0007_EBX] = { > + .feat_names = smca_feature_name, > + .cpuid_eax = 0x80000007, > + .cpuid_reg = R_EBX, > + }, > }; > > typedef struct X86RegisterInfo32 { > @@ -1449,6 +1465,44 @@ static X86CPUDefinition builtin_x86_defs[] = { > .xlevel = 0x8000001A, > .model_id = "AMD Opteron 63xx class CPU", > }, > + { > + .name = "Zen", > + .level = 0xd, > + .vendor = CPUID_VENDOR_AMD, > + .family = 23, > + .model = 0, > + .stepping = 0, > + .features[FEAT_1_EDX] = > + CPUID_VME | CPUID_SSE2 | CPUID_SSE | CPUID_FXSR | CPUID_MMX | > + CPUID_CLFLUSH | CPUID_PSE36 | CPUID_PAT | CPUID_CMOV | CPUID_MCA > | > + CPUID_PGE | CPUID_MTRR | CPUID_SEP | CPUID_APIC | CPUID_CX8 | > + CPUID_MCE | CPUID_PAE | CPUID_MSR | CPUID_TSC | CPUID_PSE | > + CPUID_DE | CPUID_FP87, > + .features[FEAT_1_ECX] = > + CPUID_EXT_F16C | CPUID_EXT_AVX | CPUID_EXT_XSAVE | > + CPUID_EXT_AES | CPUID_EXT_POPCNT | CPUID_EXT_SSE42 | > + CPUID_EXT_SSE41 | CPUID_EXT_CX16 | CPUID_EXT_FMA | > + CPUID_EXT_SSSE3 | CPUID_EXT_PCLMULQDQ | CPUID_EXT_SSE3, > + .features[FEAT_8000_0001_EDX] = > + CPUID_EXT2_LM | CPUID_EXT2_RDTSCP | > + CPUID_EXT2_PDPE1GB | CPUID_EXT2_FXSR | CPUID_EXT2_MMX | > + CPUID_EXT2_NX | CPUID_EXT2_PSE36 | CPUID_EXT2_PAT | > + CPUID_EXT2_CMOV | CPUID_EXT2_MCA | CPUID_EXT2_PGE | > + CPUID_EXT2_MTRR | CPUID_EXT2_SYSCALL | CPUID_EXT2_APIC | > + CPUID_EXT2_CX8 | CPUID_EXT2_MCE | CPUID_EXT2_PAE | > CPUID_EXT2_MSR | > + CPUID_EXT2_TSC | CPUID_EXT2_PSE | CPUID_EXT2_DE | > CPUID_EXT2_FPU, > + .features[FEAT_8000_0001_ECX] = > + CPUID_EXT3_TBM | CPUID_EXT3_FMA4 | CPUID_EXT3_XOP | > + CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_MISALIGNSSE | > + CPUID_EXT3_SSE4A | CPUID_EXT3_ABM | CPUID_EXT3_SVM | > + CPUID_EXT3_LAHF_LM, > + /* no xsaveopt! */ > + .features[FEAT_8000_0007_EBX] = > + CPUID_OVERFLOW_RECOV | CPUID_SUCCOR | CPUID_SMCA, > + .xlevel = 0x8000001A, > + .model_id = "AMD Zen CPU", > + }, > + > }; > > typedef struct PropValue { > @@ -2118,6 +2172,9 @@ static int x86_cpu_filter_features(X86CPU *cpu) > FeatureWord w; > int rv = 0; > > + if (!cpu->filter_cpuid) > + return 0; > + > for (w = 0; w < FEATURE_WORDS; w++) { > uint32_t host_feat = > x86_cpu_get_supported_feature_word(w, cpu->migratable); > @@ -2596,7 +2653,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, > uint32_t count, > break; > case 0x80000007: > *eax = 0; > - *ebx = 0; > + *ebx = env->features[FEAT_8000_0007_EBX]; > *ecx = 0; > *edx = env->features[FEAT_8000_0007_EDX]; > break; > @@ -3256,6 +3313,7 @@ static Property x86_cpu_properties[] = { > DEFINE_PROP_BOOL("hv-stimer", X86CPU, hyperv_stimer, false), > DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true), > DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false), > + DEFINE_PROP_BOOL("filter", X86CPU, filter_cpuid, false), > DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true), > DEFINE_PROP_UINT32("level", X86CPU, env.cpuid_level, 0), > DEFINE_PROP_UINT32("xlevel", X86CPU, env.cpuid_xlevel, 0), > diff --git a/target-i386/cpu.h b/target-i386/cpu.h > index 474b0b937d71..258c1b261cd2 100644 > --- a/target-i386/cpu.h > +++ b/target-i386/cpu.h > @@ -443,6 +443,7 @@ typedef enum FeatureWord { > FEAT_SVM, /* CPUID[8000_000A].EDX */ > FEAT_XSAVE, /* CPUID[EAX=0xd,ECX=1].EAX */ > FEAT_6_EAX, /* CPUID[6].EAX */ > + FEAT_8000_0007_EBX, /* CPUID[8000_0007].EBX */ > FEATURE_WORDS, > } FeatureWord; > > @@ -620,6 +621,11 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS]; > #define CPUID_APM_INVTSC (1U << 8) > > #define CPUID_VENDOR_SZ 12 > +/* CPUID[0x80000007].EBX flags: */ > +#define CPUID_OVERFLOW_RECOV (1U << 0) /* MCA overflow recovery support */ > +#define CPUID_SUCCOR (1U << 1) /* Uncorrectable error containment and > recovery */ > +#define CPUID_SMCA (1U << 3) /* Scalable MCA */ > + > > #define CPUID_VENDOR_INTEL_1 0x756e6547 /* "Genu" */ > #define CPUID_VENDOR_INTEL_2 0x49656e69 /* "ineI" */ > @@ -1160,6 +1166,7 @@ struct X86CPU { > bool hyperv_stimer; > bool check_cpuid; > bool enforce_cpuid; > + bool filter_cpuid; > bool expose_kvm; > bool migratable; > bool host_features; > -- > 2.7.3 > > -- > Regards/Gruss, > Boris. > > ECO tip #101: Trim your mails when you reply. > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX 2016-07-07 16:01 ` Borislav Petkov 2016-07-07 16:17 ` Paolo Bonzini @ 2016-07-07 16:27 ` Eduardo Habkost 2016-07-07 17:04 ` Borislav Petkov 1 sibling, 1 reply; 20+ messages in thread From: Eduardo Habkost @ 2016-07-07 16:27 UTC (permalink / raw) To: Borislav Petkov Cc: Paolo Bonzini, linux-kernel, kvm, stable, Yazen Ghannam, Brijesh Singh On Thu, Jul 07, 2016 at 06:01:46PM +0200, Borislav Petkov wrote: > On Thu, Jul 07, 2016 at 03:16:21PM +0200, Paolo Bonzini wrote: > > Eduardo is the one to answer, but usually we add features to QEMU > > before the processors are released (typically as soon as KVM supports > > them). So with a new enough QEMU this in theory should not be > > necessary. > > > > Adding a new feature that's not in a CPU model and that's not > > associated to new state is really trivial: > > Cool. > > Btw, how about something like this? > > Specifically, I'd like to test RAS features on the new upcoming AMD > Zen CPU and I've defined one from the stuff we know so far from kernel > patches. You mean KVM kernel patches? I assume the features require additional KVM code to support them in guests. In that case, why wouldn't the kernel return them in GET_SUPPORTED_CPUID? Then you won't need filter=off. > > The "filter=off" thing I've added in case I want to disable > x86_cpu_filter_features() but it works just fine without it when I boot > with -cpu Zen. So I can remove it too. > > Would something like that be acceptable? About filter=off: not sure. Do we really have valid use cases to enable a feature even if the kernel reports it as unsupported in GET_SUPPORTED_CPUID? Specifically about the feature names, I have some question below: > > We can continue improving on this as features become known and even > implement some functionality in qemu/kvm as time allows. > > --- > From: Borislav Petkov <bp@suse.de> > Date: Tue, 5 Jul 2016 16:12:18 +0200 > Subject: [PATCH] Zen emu: first working version > > Boot with "-c Zen,filter=off" to disable CPUID bits filtering. > > Signed-off-by: Borislav Petkov <bp@suse.de> > --- > target-i386/cpu.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- > target-i386/cpu.h | 7 +++++++ > 2 files changed, 66 insertions(+), 1 deletion(-) > > diff --git a/target-i386/cpu.c b/target-i386/cpu.c > index 3bd3cfc3ad16..cc9c97457387 100644 > --- a/target-i386/cpu.c > +++ b/target-i386/cpu.c > @@ -307,6 +307,17 @@ static const char *cpuid_6_feature_name[] = { > NULL, NULL, NULL, NULL, > }; > > +static const char *smca_feature_name[] = { > + "overflow_recov", "succor", NULL, "smca", Do those features introduce additional state that need migration support? If they do, you need to add them to feature_word_info[FEAT_8000_0007_EBX].unmigratable_flags until migration support is implemented. > + NULL, NULL, NULL, NULL, > + NULL, NULL, NULL, NULL, > + NULL, NULL, NULL, NULL, > + NULL, NULL, NULL, NULL, > + NULL, NULL, NULL, NULL, > + NULL, NULL, NULL, NULL, > + NULL, NULL, NULL, NULL, > +}; > + > #define I486_FEATURES (CPUID_FP87 | CPUID_VME | CPUID_PSE) > #define PENTIUM_FEATURES (I486_FEATURES | CPUID_DE | CPUID_TSC | \ > CPUID_MSR | CPUID_MCE | CPUID_CX8 | CPUID_MMX | CPUID_APIC) > @@ -449,6 +460,11 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = { > .cpuid_eax = 6, .cpuid_reg = R_EAX, > .tcg_features = TCG_6_EAX_FEATURES, > }, > + [FEAT_8000_0007_EBX] = { > + .feat_names = smca_feature_name, > + .cpuid_eax = 0x80000007, > + .cpuid_reg = R_EBX, > + }, > }; > > typedef struct X86RegisterInfo32 { [...] -- Eduardo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX 2016-07-07 16:27 ` Eduardo Habkost @ 2016-07-07 17:04 ` Borislav Petkov 2016-07-07 17:43 ` Eduardo Habkost 0 siblings, 1 reply; 20+ messages in thread From: Borislav Petkov @ 2016-07-07 17:04 UTC (permalink / raw) To: Eduardo Habkost Cc: Paolo Bonzini, linux-kernel, kvm, stable, Yazen Ghannam, Brijesh Singh On Thu, Jul 07, 2016 at 01:27:55PM -0300, Eduardo Habkost wrote: > You mean KVM kernel patches? No, other ones. Here's one example: https://lkml.kernel.org/r/1467633035-32080-2-git-send-email-Yazen.Ghannam@amd.com > I assume the features require additional KVM code to support them > in guests. In that case, why wouldn't the kernel return them in > GET_SUPPORTED_CPUID? Then you won't need filter=off. Yeah, so in most cases they will need additional KVM code to enable them. More often than not, this is not always at the top of the TODO list of people so ... That's why I did the quick thing of smoke-testing them by enabling only CPUID bits and the filter=off thing. Would it be nicer to see them actually implemented in qemu/kvm? Definitely. > About filter=off: not sure. Do we really have valid use cases to > enable a feature even if the kernel reports it as unsupported in > GET_SUPPORTED_CPUID? Yeah, as said above, the filter=off thing was a dirty hack just to stop x86_cpu_filter_features() from checking whether the host supports them or not. > Do those features introduce additional state that need migration > support? If they do, you need to add them to > feature_word_info[FEAT_8000_0007_EBX].unmigratable_flags until > migration support is implemented. I'm afraid you'd need to explain migration support to me: is the question whether migrating the guest to an Intel platform and whether the features would still work? Because those three above are AMD-only and they won't work on an Intel platform. And if so, I'm guessing they should always remain unmigratable. Which is not a problem as there are Intel features which are not present on AMD so... Thanks! -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX 2016-07-07 17:04 ` Borislav Petkov @ 2016-07-07 17:43 ` Eduardo Habkost 2016-07-08 11:09 ` Borislav Petkov 0 siblings, 1 reply; 20+ messages in thread From: Eduardo Habkost @ 2016-07-07 17:43 UTC (permalink / raw) To: Borislav Petkov Cc: Paolo Bonzini, linux-kernel, kvm, stable, Yazen Ghannam, Brijesh Singh On Thu, Jul 07, 2016 at 07:04:42PM +0200, Borislav Petkov wrote: > On Thu, Jul 07, 2016 at 01:27:55PM -0300, Eduardo Habkost wrote: > > You mean KVM kernel patches? > > No, other ones. Here's one example: > > https://lkml.kernel.org/r/1467633035-32080-2-git-send-email-Yazen.Ghannam@amd.com > > > I assume the features require additional KVM code to support them > > in guests. In that case, why wouldn't the kernel return them in > > GET_SUPPORTED_CPUID? Then you won't need filter=off. > > Yeah, so in most cases they will need additional KVM code to enable > them. More often than not, this is not always at the top of the TODO > list of people so ... > > That's why I did the quick thing of smoke-testing them by enabling only > CPUID bits and the filter=off thing. > > Would it be nicer to see them actually implemented in qemu/kvm? > Definitely. > > > About filter=off: not sure. Do we really have valid use cases to > > enable a feature even if the kernel reports it as unsupported in > > GET_SUPPORTED_CPUID? > > Yeah, as said above, the filter=off thing was a dirty hack just to stop > x86_cpu_filter_features() from checking whether the host supports them > or not. I see. If you have an useful use case for it, we may consider that. But first I would like to see an actual case where a feature was not added to GET_SUPPORTED_CPUID yet, but would not crash and burn if forcibly enabled by QEMU. > > > Do those features introduce additional state that need migration > > support? If they do, you need to add them to > > feature_word_info[FEAT_8000_0007_EBX].unmigratable_flags until > > migration support is implemented. > > I'm afraid you'd need to explain migration support to me: is the > question whether migrating the guest to an Intel platform and whether > the features would still work? > > Because those three above are AMD-only and they won't work on an Intel > platform. > > And if so, I'm guessing they should always remain unmigratable. > > Which is not a problem as there are Intel features which are not present > on AMD so... I mean live migration to a different host (that normally has the same CPU vendor). When you live-migrate or use savevm, you need to send the machine state to the other host. This is implemented using VMStateDescription structs describing the data to be migrated. See vmstate_x86_cpu in target-i386/machine.c, for example. You need additional migration sections if the feature introduces additional state (e.g. CPU registers) that need to be migrated too, to keep the feature working. If there's new state but no migration support is implemented yet, you need to add the feature to unmigratable_flags. For an example where no additional state is introduced by new features, see: Author: Xiao Guangrong <guangrong.xiao@linux.intel.com> Date: Thu Oct 29 15:31:39 2015 +0800 target-i386: Enable clflushopt/clwb/pcommit instructions These instructions are used by NVDIMM drivers and the specification is located at: https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf There instructions are available on Skylake Server. Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> For an example where additional state is introduced by a CPU feature and migration support was implemented, see: commit f74eefe0b98cd7e13825de8e8d9f32e22aed102c Author: Huaitong Han <huaitong.han@intel.com> Date: Wed Nov 18 10:20:15 2015 +0800 target-i386: Add PKU and and OSPKE support Add PKU and OSPKE CPUID features, including xsave state and migration support. Signed-off-by: Huaitong Han <huaitong.han@intel.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> [ehabkost: squashed 3 patches together, edited patch description] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> For an example where a feature was added without required migration code and was added to unmigratable_flags, see: commit 0bb0b2d2fe7f645ddaf1f0ff40ac669c9feb4aa1 Author: Paolo Bonzini <pbonzini@redhat.com> Date: Mon Nov 24 15:54:43 2014 +0100 target-i386: add feature flags for CPUID[EAX=0xd,ECX=1] These represent xsave-related capabilities of the processor, and KVM may or may not support them. Add feature bits so that they are considered by "-cpu ...,enforce", and use the new feature work instead of calling kvm_arch_get_supported_cpuid. Bit 3 (XSAVES) is not migratables because it requires saving MSR_IA32_XSS. Neither KVM nor any commonly available hardware supports it anyway. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> -- Eduardo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX 2016-07-07 17:43 ` Eduardo Habkost @ 2016-07-08 11:09 ` Borislav Petkov 2016-07-08 11:15 ` Paolo Bonzini 0 siblings, 1 reply; 20+ messages in thread From: Borislav Petkov @ 2016-07-08 11:09 UTC (permalink / raw) To: Eduardo Habkost Cc: Paolo Bonzini, linux-kernel, kvm, stable, Yazen Ghannam, Brijesh Singh On Thu, Jul 07, 2016 at 02:43:49PM -0300, Eduardo Habkost wrote: > I see. If you have an useful use case for it, we may consider > that. But first I would like to see an actual case where a > feature was not added to GET_SUPPORTED_CPUID yet, but would not > crash and burn if forcibly enabled by QEMU. Ok. > I mean live migration to a different host (that normally has the > same CPU vendor). When you live-migrate or use savevm, you need > to send the machine state to the other host. This is implemented > using VMStateDescription structs describing the data to be > migrated. See vmstate_x86_cpu in target-i386/machine.c, for > example. > > You need additional migration sections if the feature introduces > additional state (e.g. CPU registers) that need to be migrated > too, to keep the feature working. If there's new state but no > migration support is implemented yet, you need to add the feature > to unmigratable_flags. > > For an example where no additional state is introduced by new > features, see: Thanks for the examples and the explanation - I see the deal now. Ok, I'll go through the features and see what kind of state the kernel programs in there and add them to a VMStateDescription thing. Hohumm, makes sense to me. Thanks. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX 2016-07-08 11:09 ` Borislav Petkov @ 2016-07-08 11:15 ` Paolo Bonzini 2016-07-08 12:55 ` Borislav Petkov 0 siblings, 1 reply; 20+ messages in thread From: Paolo Bonzini @ 2016-07-08 11:15 UTC (permalink / raw) To: Borislav Petkov Cc: Eduardo Habkost, linux-kernel, kvm, stable, Yazen Ghannam, Brijesh Singh > Ok, I'll go through the features and see what kind of state the kernel > programs in there and add them to a VMStateDescription thing. Hohumm, > makes sense to me. It does sometimes happen that there is no state. For example it could be an MSR that we are already getting in and out of KVM. However, it is way more common that you have to add support for reading/writing the MSR in KVM as well, and then teach QEMU's target-i386/kvm.c about it as well. It's hard to say without knowing exactly what the feature is about... Is there an architecture manual out there that documents it? Paolo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX 2016-07-08 11:15 ` Paolo Bonzini @ 2016-07-08 12:55 ` Borislav Petkov 0 siblings, 0 replies; 20+ messages in thread From: Borislav Petkov @ 2016-07-08 12:55 UTC (permalink / raw) To: Paolo Bonzini Cc: Eduardo Habkost, linux-kernel, kvm, stable, Yazen Ghannam, Brijesh Singh, Tony Luck On Fri, Jul 08, 2016 at 07:15:39AM -0400, Paolo Bonzini wrote: > It does sometimes happen that there is no state. For example it could be > an MSR that we are already getting in and out of KVM. Right. > However, it is way more common that you have to add support for > reading/writing the MSR in KVM as well, and then teach QEMU's > target-i386/kvm.c about it as well. > > It's hard to say without knowing exactly what the feature is about... > Is there an architecture manual out there that documents it? Maybe section 2.16 here: http://support.amd.com/TechDocs/50742_15h_Models_60h-6Fh_BKDG.pdf In any case, here are two bit definitions: 1 SUCCOR: Software uncorrectable error containment and recovery capability. Value: 1. 1=The processor supports software containment of uncorrectable errors through context synchronizing data poisoning and deferred error interrupts; see 2.16.1.10 [Deferred Errors and Data Poisoning]; MSR MSRC000_0410 [Machine Check Deferred Error Configuration (CU_DEFER_ERR)] exists. 0 McaOverflowRecov: MCA overflow recovery support. Value: 1. 1=MCA overflow conditions (MCi_STATUS[Overflow]=1) are not fatal; software may safely ignore such conditions. 0=MCA overflow conditions require software to shut down the system. See 2.16.1.6 [Handling Machine Check Exceptions]. So AFAICT the McaOverflowRecov thing should be the easiest by making sure MCi_STATUS[Overflow]=1 is set properly when MCEs happen. The SUCCOR thing needs data poisoning and deferred error interrupts and that's a lot more involved than the overflow handling. And we'll need to touch a lot more places. But it doesn't hurt to start looking at them at least. Bottom line is, the more RAS features we could test with qemu/kvm the better because generating those error conditions on a real system is very very hard and sometimes even impossible. Especially if you try to inject an error but then the BIOS facility which does that is b0rked because vendor forgot it. Crap like that. I'll do some looking into all that when I get free moments, who knows, we might get something going... Thanks. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX 2016-07-06 13:43 [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX Paolo Bonzini 2016-07-06 14:18 ` Borislav Petkov @ 2016-07-06 15:00 ` kbuild test robot 2016-07-15 12:15 ` Radim Krčmář 2 siblings, 0 replies; 20+ messages in thread From: kbuild test robot @ 2016-07-06 15:00 UTC (permalink / raw) To: Paolo Bonzini; +Cc: kbuild-all, linux-kernel, kvm, stable, Borislav Petkov [-- Attachment #1: Type: text/plain, Size: 1381 bytes --] Hi, [auto build test ERROR on kvm/linux-next] [also build test ERROR on v4.7-rc6 next-20160706] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Paolo-Bonzini/KVM-SVM-fix-trashing-of-MSR_TSC_AUX/20160706-214557 base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next config: i386-randconfig-a0-201627 (attached as .config) compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): arch/x86/kvm/svm.c: In function 'svm_vcpu_run': >> arch/x86/kvm/svm.c:4549:23: error: implicit declaration of function '__getcpu' [-Werror=implicit-function-declaration] wrmsrl(MSR_TSC_AUX, __getcpu()); ^~~~~~~~ cc1: some warnings being treated as errors vim +/__getcpu +4549 arch/x86/kvm/svm.c 4543 #else 4544 , "ebx", "ecx", "edx", "esi", "edi" 4545 #endif 4546 ); 4547 4548 if (static_cpu_has(X86_FEATURE_RDTSCP)) > 4549 wrmsrl(MSR_TSC_AUX, __getcpu()); 4550 #ifdef CONFIG_X86_64 4551 wrmsrl(MSR_GS_BASE, svm->host.gs_base); 4552 #else --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/octet-stream, Size: 24154 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX 2016-07-06 13:43 [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX Paolo Bonzini 2016-07-06 14:18 ` Borislav Petkov 2016-07-06 15:00 ` kbuild test robot @ 2016-07-15 12:15 ` Radim Krčmář 2016-07-15 12:30 ` Paolo Bonzini 2 siblings, 1 reply; 20+ messages in thread From: Radim Krčmář @ 2016-07-15 12:15 UTC (permalink / raw) To: Paolo Bonzini; +Cc: linux-kernel, kvm, stable, Borislav Petkov 2016-07-06 15:43+0200, Paolo Bonzini: > I don't know what I was thinking when I wrote commit 46896c73c1a4 ("KVM: > svm: add support for RDTSCP", 2015-11-12); I missed write_rdtscp_aux which > obviously uses MSR_TSC_AUX. > > Therefore we do need to save/restore MSR_TSC_AUX in svm_vcpu_run. Hm, MSR_TSC_AUX is in host_save_user_msrs[], so we save it on every svm_vcpu_load() and restore on svm_vcpu_put(). Linux does not use RDTSCP and every transition to userspace has svm_vcpu_put() in between. We also still do "wrmsrl(MSR_TSC_AUX, svm->tsc_aux);" in svm_set_msr() and can switch to userspace without performing svm_vcpu_run() first. Was this patch fixing the host userspace or something in the guest? Thanks. > Cc: stable@vger.kernel.org > Cc: Borislav Petkov <bp@alien8.de> > Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP") > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > arch/x86/kvm/svm.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > index 16ef31b87452..44f6368f8b45 100644 > --- a/arch/x86/kvm/svm.c > +++ b/arch/x86/kvm/svm.c > @@ -43,6 +43,7 @@ > #include <asm/kvm_para.h> > > #include <asm/virtext.h> > +#include <asm/vgtod.h> > #include "trace.h" > > #define __ex(x) __kvm_handle_fault_on_reboot(x) > @@ -1530,9 +1531,6 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) > wrmsrl(MSR_AMD64_TSC_RATIO, tsc_ratio); > } > } > - /* This assumes that the kernel never uses MSR_TSC_AUX */ > - if (static_cpu_has(X86_FEATURE_RDTSCP)) > - wrmsrl(MSR_TSC_AUX, svm->tsc_aux); > > avic_vcpu_load(vcpu, cpu); > } > @@ -4474,6 +4472,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) > svm->vmcb->save.cr2 = vcpu->arch.cr2; > > clgi(); > + if (static_cpu_has(X86_FEATURE_RDTSCP)) > + wrmsrl(MSR_TSC_AUX, svm->tsc_aux); > > local_irq_enable(); > > @@ -4550,6 +4550,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) > #endif > ); > > + if (static_cpu_has(X86_FEATURE_RDTSCP)) > + wrmsrl(MSR_TSC_AUX, __getcpu()); > #ifdef CONFIG_X86_64 > wrmsrl(MSR_GS_BASE, svm->host.gs_base); > #else > -- > 1.8.3.1 > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX 2016-07-15 12:15 ` Radim Krčmář @ 2016-07-15 12:30 ` Paolo Bonzini 0 siblings, 0 replies; 20+ messages in thread From: Paolo Bonzini @ 2016-07-15 12:30 UTC (permalink / raw) To: Radim Krčmář; +Cc: linux-kernel, kvm, stable, Borislav Petkov On 15/07/2016 14:15, Radim Krčmář wrote: > 2016-07-06 15:43+0200, Paolo Bonzini: >> I don't know what I was thinking when I wrote commit 46896c73c1a4 ("KVM: >> svm: add support for RDTSCP", 2015-11-12); I missed write_rdtscp_aux which >> obviously uses MSR_TSC_AUX. >> >> Therefore we do need to save/restore MSR_TSC_AUX in svm_vcpu_run. > > Hm, MSR_TSC_AUX is in host_save_user_msrs[], so we save it on every > svm_vcpu_load() and restore on svm_vcpu_put(). Linux does not use > RDTSCP and every transition to userspace has svm_vcpu_put() in between. > > We also still do "wrmsrl(MSR_TSC_AUX, svm->tsc_aux);" in svm_set_msr() > and can switch to userspace without performing svm_vcpu_run() first. > > Was this patch fixing the host userspace or something in the guest? I think I was helping someone to debug an AMD failure and missed the addition to host_save_user_msrs in commit 46896c73c1a4. So feel free to revert, I guess. Sorry for the mess. Paolo > Thanks. > >> Cc: stable@vger.kernel.org >> Cc: Borislav Petkov <bp@alien8.de> >> Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP") >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> >> --- >> arch/x86/kvm/svm.c | 8 +++++--- >> 1 file changed, 5 insertions(+), 3 deletions(-) >> >> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c >> index 16ef31b87452..44f6368f8b45 100644 >> --- a/arch/x86/kvm/svm.c >> +++ b/arch/x86/kvm/svm.c >> @@ -43,6 +43,7 @@ >> #include <asm/kvm_para.h> >> >> #include <asm/virtext.h> >> +#include <asm/vgtod.h> >> #include "trace.h" >> >> #define __ex(x) __kvm_handle_fault_on_reboot(x) >> @@ -1530,9 +1531,6 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) >> wrmsrl(MSR_AMD64_TSC_RATIO, tsc_ratio); >> } >> } >> - /* This assumes that the kernel never uses MSR_TSC_AUX */ >> - if (static_cpu_has(X86_FEATURE_RDTSCP)) >> - wrmsrl(MSR_TSC_AUX, svm->tsc_aux); >> >> avic_vcpu_load(vcpu, cpu); >> } >> @@ -4474,6 +4472,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) >> svm->vmcb->save.cr2 = vcpu->arch.cr2; >> >> clgi(); >> + if (static_cpu_has(X86_FEATURE_RDTSCP)) >> + wrmsrl(MSR_TSC_AUX, svm->tsc_aux); >> >> local_irq_enable(); >> >> @@ -4550,6 +4550,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) >> #endif >> ); >> >> + if (static_cpu_has(X86_FEATURE_RDTSCP)) >> + wrmsrl(MSR_TSC_AUX, __getcpu()); >> #ifdef CONFIG_X86_64 >> wrmsrl(MSR_GS_BASE, svm->host.gs_base); >> #else >> -- >> 1.8.3.1 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2016-07-15 12:30 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-07-06 13:43 [PATCH] KVM: SVM: fix trashing of MSR_TSC_AUX Paolo Bonzini 2016-07-06 14:18 ` Borislav Petkov 2016-07-06 14:29 ` Paolo Bonzini 2016-07-07 10:41 ` Borislav Petkov 2016-07-07 11:01 ` Paolo Bonzini 2016-07-07 11:47 ` Borislav Petkov 2016-07-07 12:28 ` Paolo Bonzini 2016-07-07 12:47 ` Borislav Petkov 2016-07-07 13:16 ` Paolo Bonzini 2016-07-07 16:01 ` Borislav Petkov 2016-07-07 16:17 ` Paolo Bonzini 2016-07-07 16:27 ` Eduardo Habkost 2016-07-07 17:04 ` Borislav Petkov 2016-07-07 17:43 ` Eduardo Habkost 2016-07-08 11:09 ` Borislav Petkov 2016-07-08 11:15 ` Paolo Bonzini 2016-07-08 12:55 ` Borislav Petkov 2016-07-06 15:00 ` kbuild test robot 2016-07-15 12:15 ` Radim Krčmář 2016-07-15 12:30 ` Paolo Bonzini
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox