public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] KVM: x86: expose CVE-2017-5715 ("Spectre variant 2") mitigations to guest
@ 2018-01-08 18:08 Paolo Bonzini
  2018-01-08 18:08 ` [PATCH 1/7] KVM: x86: add SPEC_CTRL and IBPB_SUPPORT accessors Paolo Bonzini
                   ` (7 more replies)
  0 siblings, 8 replies; 61+ messages in thread
From: Paolo Bonzini @ 2018-01-08 18:08 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: jmattson, aliguori, thomas.lendacky, dwmw, bp

This series allows guests to use the MSR_IA32_SPEC_CTRL and
MSR_IA32_PRED_CMD model specific registers that were added as mitigations
for CVE-2017-5715.

These are only the KVM specific parts of the fix.  It does *not* yet
include any protection for reading host memory from the guest, because
that would be done in the same way as the rest of Linux.  So there is no
IBRS *usage* here, no retpolines, no stuffing of the return stack buffer.
(KVM already includes a fix to clear all registers on vmexit, which is
enough to block Google Project Zero's PoC exploit).

However, I am including the changes to use IBPB (indirect branch
predictor barrier) if available.  That occurs only when there is a VCPU
switch on a physical CPU, thus it has a small impact on performance.

The patches are a bit hackish because the relevant cpufeatures have
not been included yet, and because I wanted to make the patches easier
to backport to distro kernels if desired, but I would still like to
have them in 4.16.

Please review.

Thanks,

Paolo

Paolo Bonzini (5):
  KVM: x86: add SPEC_CTRL and IBPB_SUPPORT accessors
  x86/msr: add definitions for indirect branch predictor MSRs
  kvm: vmx: pass MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD down to the guest
  kvm: svm: pass MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD down to guest
  KVM: x86: add SPEC_CTRL and IBPB_SUPPORT to MSR and CPUID lists

Tim Chen (1):
  kvm: vmx: Set IBPB when running a different VCPU

Tom Lendacky (1):
  x86/svm: Set IBPB when running a different VCPU

 arch/x86/include/asm/msr-index.h |  5 ++++
 arch/x86/kvm/cpuid.c             | 27 +++++++++++++----
 arch/x86/kvm/cpuid.h             | 22 ++++++++++++++
 arch/x86/kvm/svm.c               | 65 +++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx.c               | 41 +++++++++++++++++++++++++
 arch/x86/kvm/x86.c               |  1 +
 6 files changed, 154 insertions(+), 7 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 61+ messages in thread
* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
@ 2018-01-09 11:31 Liran Alon
  2018-01-09 11:41 ` Paolo Bonzini
  0 siblings, 1 reply; 61+ messages in thread
From: Liran Alon @ 2018-01-09 11:31 UTC (permalink / raw)
  To: pbonzini; +Cc: jmattson, dwmw, bp, thomas.lendacky, aliguori, linux-kernel, kvm


----- pbonzini@redhat.com wrote:

> On 08/01/2018 21:00, Liran Alon wrote:
> > 
> > 
> > On 08/01/18 20:08, Paolo Bonzini wrote:
> >> From: Tom Lendacky <thomas.lendacky@amd.com>
> >>
> >> Set IBPB (Indirect Branch Prediction Barrier) when the current CPU
> is
> >> going to run a VCPU different from what was previously run. 
> Nested
> >> virtualization uses the same VMCB for the second level guest, but
> the
> >> L1 hypervisor should be using IBRS to protect itself.
> >>
> >> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >> ---
> >>   arch/x86/kvm/svm.c | 31 +++++++++++++++++++++++++++++++
> >>   1 file changed, 31 insertions(+)
> >>
> >> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >> index 779981a00d01..dd744d896cec 100644
> >> --- a/arch/x86/kvm/svm.c
> >> +++ b/arch/x86/kvm/svm.c
> >> @@ -289,6 +289,7 @@ struct amd_svm_iommu_ir {
> >>   module_param(vgif, int, 0444);
> >>
> >>   static bool __read_mostly have_spec_ctrl;
> >> +static bool __read_mostly have_ibpb_support;
> >>
> >>   static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long
> cr0);
> >>   static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool
> invalidate_gpa);
> >> @@ -540,6 +541,7 @@ struct svm_cpu_data {
> >>       struct kvm_ldttss_desc *tss_desc;
> >>
> >>       struct page *save_area;
> >> +    struct vmcb *current_vmcb;
> >>   };
> >>
> >>   static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
> >> @@ -1151,6 +1153,11 @@ static __init int svm_hardware_setup(void)
> >>           pr_info("kvm: SPEC_CTRL available\n");
> >>       else
> >>           pr_info("kvm: SPEC_CTRL not available\n");
> >> +    have_ibpb_support = have_spec_ctrl || cpu_has_ibpb_support();
> >> +    if (have_ibpb_support)
> >> +        pr_info("kvm: IBPB_SUPPORT available\n");
> >> +    else
> >> +        pr_info("kvm: IBPB_SUPPORT not available\n");
> >>
> >>       return 0;
> >>
> >> @@ -1725,11 +1732,19 @@ static void svm_free_vcpu(struct kvm_vcpu
> *vcpu)
> >>       __free_pages(virt_to_page(svm->nested.msrpm),
> MSRPM_ALLOC_ORDER);
> >>       kvm_vcpu_uninit(vcpu);
> >>       kmem_cache_free(kvm_vcpu_cache, svm);
> >> +
> >> +    /*
> >> +     * The VMCB could be recycled, causing a false negative in
> >> +     * svm_vcpu_load; block speculative execution.
> >> +     */
> >> +    if (have_ibpb_support)
> >> +        wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
> >>   }
> >>
> >>   static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >>   {
> >>       struct vcpu_svm *svm = to_svm(vcpu);
> >> +    struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
> >>       int i;
> >>
> >>       if (unlikely(cpu != vcpu->cpu)) {
> >> @@ -1758,6 +1773,12 @@ static void svm_vcpu_load(struct kvm_vcpu
> >> *vcpu, int cpu)
> >>       if (static_cpu_has(X86_FEATURE_RDTSCP))
> >>           wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
> >>
> >> +    if (sd->current_vmcb != svm->vmcb) {
> >> +        sd->current_vmcb = svm->vmcb;
> >> +        if (have_ibpb_support)
> >> +            wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
> >> +    }
> >> +
> >>       avic_vcpu_load(vcpu, cpu);
> >>   }
> >>
> >> @@ -2798,6 +2819,11 @@ static int nested_svm_vmexit(struct vcpu_svm
> *svm)
> >>       if (!nested_vmcb)
> >>           return 1;
> >>
> >> +    /*
> >> +     * No need for IBPB here, the L1 hypervisor should be running
> with
> >> +     * IBRS=1 and inserts one already when switching L2 VMs.
> >> +     */
> >> +
> > 
> > I am not fully convinced yet this is OK from security perspective.
> > From the CPU point-of-view, both L1 & L2 run in the same
> prediction-mode
> > (as they are both running in SVM guest-mode). Therefore, IBRS will
> not
> > hide the BHB & BTB of L1 from L2.
> 
> Indeed, IBRS will not hide the indirect branch predictor state of L2
> user mode from L1 user mode.
> 
> On current generation processors, as I understand it, IBRS simply
> disables the indirect branch predictor when set to 1.  Therefore, as

This is not how I understood what IBRS does.

Intel (not sure about AMD) states that if IBRS is set, indirect branches will now allow their predicted target address to be controlled by code that executed in a less privileged prediction-mode before the IBRS was last written with a value of 1.
(Intel also states that thus an indirect branch may be affected by code in a less privileged prediction-mode that executed AFTER IBRS mode was last written with a value of 1).

Therefore, L2 could also inject BTB/BHB entries that will be used by L1:
Consider the case that L2 injects values into BTB/BHB and then issues a hypercall which will cause #VMExit which will be forwarded to L1. Because L2 & L1 runs in the same prediction-mode (both Ring0 SVM guest-mode) from physical CPU perspective, Ring0 L1 code could be using BTB/BHB entries injected by Ring0 L2.
(The fact that L0 have set IBRS to 1 when exiting from L2 to L0 doesn't prevent those entries from being used by L1 after L0 enters L1).

This is different than what happens in non-nested case as L0 & L1 runs in different physical prediction-modes and therefore setting IBRS=1 on every #VMExit should suffice.

Therefore, I don't understand how L1 setting IBRS to 1 will help him in this case.
Maybe this mechanism works differently on AMD?

-Liran

> long as the L1 hypervisor sets IBRS=1, the indirect branch predictor
> state left by L2 does not affect execution of the L1 hypervisor.
> 
> Future processors might have a mode that says "just set IBRS=1 and
> I'll
> DTRT".  If that's done by indexing the BTB using the full virtual
> address, the CPL _and_ the ASID/PCID/VPID, then IBPB is not needed
> here
> because the L2 VM uses a different ASID.  If that's done by only
> augmenting the BTB index with the CPL, we'd need an IBPB.  But in
> this
> new world we've been thrown into, that would be a processor bug...
> 
> Note that AMD currently doesn't implement IBRS at all.  In order to
> mitigate against CVE-2017-5715, the hypervisor should issue an IBPB
> on
> every vmexit (in both L0 and L1).  However, the cost of that is very
> high.  While we may want a paranoia mode that does it, it is outside
> the
> scope of this series (which is more of a "let's unblock QEMU patches"
> thing than a complete fix).
> 
> > 6. One can implicitly assume that L1 hypervisor did issued a IBPB
> when
> > loading an VMCB and therefore it doesn't have to do it again in L0.
> > However, I see 2 problems with this approach:
> > (a) We don't protect old non-patched L1 hypervisors from Spectre
> even
> > though we could easily do that from L0 with small performance hit.
> 
> Yeah, that's a nice thing to have.  However, I disagree on the
> "small"
> performance hit... on a patched hypervisor, the cost of a double fix
> can
> be very noticeable.
> 
> > (b) When L1 hypervisor runs only a single L2 VM, it doesn't have to
> > issue an IBPB when loading the VMCB (as it didn't run any other
> VMCB
> > before) and it should be sufficient from L1 perspective to just
> write 1
> > to IBRS. However, in our nested-case, this is a security-hole.
> 
> This is the main difference in our reasoning.  I think IBRS=1 (or
> IBPB
> on vmexit) should be enough.
> 
> Paolo
> 
> > Am I misunderstanding something?
> > 
> > Regards,
> > -Liran
> > 
> >>       /* Exit Guest-Mode */
> >>       leave_guest_mode(&svm->vcpu);
> >>       svm->nested.vmcb = 0;
> >> @@ -3061,6 +3087,11 @@ static bool nested_svm_vmrun(struct vcpu_svm
> *svm)
> >>       if (!nested_vmcb)
> >>           return false;
> >>
> >> +    /*
> >> +     * No need for IBPB here, since the nested VM is less
> >> privileged.  The
> >> +     * L1 hypervisor inserts one already when switching L2 VMs.
> >> +     */
> >> +
> >>       if (!nested_vmcb_checks(nested_vmcb)) {
> >>           nested_vmcb->control.exit_code    = SVM_EXIT_ERR;
> >>           nested_vmcb->control.exit_code_hi = 0;
> >>
> >

^ permalink raw reply	[flat|nested] 61+ messages in thread
* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
@ 2018-01-09 15:00 Liran Alon
  2018-01-09 15:19 ` Arjan van de Ven
  0 siblings, 1 reply; 61+ messages in thread
From: Liran Alon @ 2018-01-09 15:00 UTC (permalink / raw)
  To: arjan
  Cc: jmattson, dwmw, bp, aliguori, thomas.lendacky, pbonzini,
	linux-kernel, kvm


----- arjan@linux.intel.com wrote:

> On 1/9/2018 3:41 AM, Paolo Bonzini wrote:
> > The above ("IBRS simply disables the indirect branch predictor") was
> my
> > take-away message from private discussion with Intel.  My guess is
> that
> > the vendors are just handwaving a spec that doesn't match what they
> have
> > implemented, because honestly a microcode update is unlikely to do
> much
> > more than an old-fashioned chicken bit.  Maybe on Skylake it does
> > though, since the performance characteristics of IBRS are so
> different
> > from previous processors.  Let's ask Arjan who might have more
> > information about it, and hope he actually can disclose it...
> 
> IBRS will ensure that, when set after the ring transition, no earlier
> branch prediction data is used for indirect branches while IBRS is
> set

Consider the following scenario:
1. L1 runs with IBRS=1 in Ring0.
2. L1 restores L2 SPEC_CTRL and enters into L2.
3. L1 VMRUN exits into L0 which backups L1 SPEC_CTRL and enters L2 (using same VMCB).
4. L2 populates BTB/BHB with values and cause a hypercall which #VMExit into L0.
5. L0 backups L2 SPEC_CTRL and writes IBRS=1.
6. L0 restores L1 SPEC_CTRL and enters L1.
7. L1 backups L2 SPEC_CTRL and writes IBRS=1.

Just to make sure I understand:
You state that L2 BTB/BHB won't be used by L1 because L1 have set IBRS=1 in step (7).
And that is even though L1 & L2 could both be running in SVM guest-mode & Ring0 from physical CPU perspective. Therefore, having the same prediction-mode.

So basically you are saying that IBRS don't make sure to avoid using BTB/BHB from lower prediction-modes but instead just make sure to avoid usage of all BTB/BHB while IBRS is set.

Did I understand everything correctly?

Thanks,
-Liran

> 
> (this is a english summary of two pages of technical spec so it lacks
> the language lawyer precision)
> 
> because of this promise, the implementation tends to be impactful
> and it is very strongly recommended that retpoline is used instead of
> IBRS.
> (with all the caveats already on lkml)
> 
> the IBPB is different, this is a covenient thing for switching between
> VM guests etc

^ permalink raw reply	[flat|nested] 61+ messages in thread
* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
@ 2018-01-09 15:33 Liran Alon
  2018-01-09 15:56 ` Arjan van de Ven
  0 siblings, 1 reply; 61+ messages in thread
From: Liran Alon @ 2018-01-09 15:33 UTC (permalink / raw)
  To: arjan
  Cc: jmattson, dwmw, bp, thomas.lendacky, aliguori, pbonzini,
	linux-kernel, kvm


----- arjan@linux.intel.com wrote:

> On 1/9/2018 7:00 AM, Liran Alon wrote:
> > 
> > ----- arjan@linux.intel.com wrote:
> > 
> >> On 1/9/2018 3:41 AM, Paolo Bonzini wrote:
> >>> The above ("IBRS simply disables the indirect branch predictor")
> was
> >> my
> >>> take-away message from private discussion with Intel.  My guess
> is
> >> that
> >>> the vendors are just handwaving a spec that doesn't match what
> they
> >> have
> >>> implemented, because honestly a microcode update is unlikely to
> do
> >> much
> >>> more than an old-fashioned chicken bit.  Maybe on Skylake it does
> >>> though, since the performance characteristics of IBRS are so
> >> different
> >>> from previous processors.  Let's ask Arjan who might have more
> >>> information about it, and hope he actually can disclose it...
> >>
> >> IBRS will ensure that, when set after the ring transition, no
> earlier
> >> branch prediction data is used for indirect branches while IBRS is
> >> set
> > 
> > Consider the following scenario:
> > 1. L1 runs with IBRS=1 in Ring0.
> > 2. L1 restores L2 SPEC_CTRL and enters into L2.
> > 3. L1 VMRUN exits into L0 which backups L1 SPEC_CTRL and enters L2
> (using same VMCB).
> > 4. L2 populates BTB/BHB with values and cause a hypercall which
> #VMExit into L0.
> > 5. L0 backups L2 SPEC_CTRL and writes IBRS=1.
> > 6. L0 restores L1 SPEC_CTRL and enters L1.
> > 7. L1 backups L2 SPEC_CTRL and writes IBRS=1.
> > 
> 
> I'm sorry I'm not familiar with your L0/L1/L2 terminology
> (maybe it's before coffee has had time to permeate the brain)

These are standard terminology for guest levels:
L0 == hypervisor that runs on bare-metal
L1 == hypervisor that runs as L0 guest.
L2 == software that runs as L1 guest.
(We are talking about nested virtualization here)

-Liran

^ permalink raw reply	[flat|nested] 61+ messages in thread
* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
@ 2018-01-09 16:01 Liran Alon
  0 siblings, 0 replies; 61+ messages in thread
From: Liran Alon @ 2018-01-09 16:01 UTC (permalink / raw)
  To: arjan
  Cc: jmattson, dwmw, bp, aliguori, thomas.lendacky, pbonzini,
	linux-kernel, kvm


----- arjan@linux.intel.com wrote:

> >> I'm sorry I'm not familiar with your L0/L1/L2 terminology
> >> (maybe it's before coffee has had time to permeate the brain)
> > 
> > These are standard terminology for guest levels:
> > L0 == hypervisor that runs on bare-metal
> > L1 == hypervisor that runs as L0 guest.
> > L2 == software that runs as L1 guest.
> > (We are talking about nested virtualization here)
> 
> 1. I really really hope that the guests don't use IBRS but use
> retpoline. At least for Linux that is going to be the prefered
> approach.
> 
> 2. For the CPU, there really is only "bare metal" vs "guest"; all
> guests are "guests" no matter how deeply nested. So for the language
> of privilege domains etc,
> nested guests equal their parent.

So in the scenario I mentioned above, would L1 use BTB/BHB entries inserted by L2?
To me it seems that it would if IBRS takes prediction-mode into consideration.
And therefore, we must issue IBPB when switching between L1 & L2.
Same as happen on nVMX when switching between vmcs01 & vmcs02.

-Liran

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2018-01-11 10:41 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-08 18:08 [PATCH 0/7] KVM: x86: expose CVE-2017-5715 ("Spectre variant 2") mitigations to guest Paolo Bonzini
2018-01-08 18:08 ` [PATCH 1/7] KVM: x86: add SPEC_CTRL and IBPB_SUPPORT accessors Paolo Bonzini
2018-01-08 18:33   ` Konrad Rzeszutek Wilk
2018-01-08 19:09   ` Liran Alon
2018-01-09 10:32     ` Paolo Bonzini
2018-01-09 11:14   ` David Hildenbrand
2018-01-09 11:18     ` Paolo Bonzini
2018-01-08 18:08 ` [PATCH 2/7] x86/msr: add definitions for indirect branch predictor MSRs Paolo Bonzini
2018-01-08 18:35   ` Konrad Rzeszutek Wilk
2018-01-08 18:52     ` Jim Mattson
2018-01-08 19:10   ` Liran Alon
2018-01-08 18:08 ` [PATCH 3/7] kvm: vmx: pass MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD down to the guest Paolo Bonzini
2018-01-08 18:43   ` Konrad Rzeszutek Wilk
2018-01-08 19:18   ` Jim Mattson
2018-01-08 20:23     ` Liran Alon
2018-01-08 22:32     ` Paolo Bonzini
2018-01-08 23:19       ` Jim Mattson
2018-01-09 10:11         ` Paolo Bonzini
2018-01-08 19:22   ` Liran Alon
2018-01-08 19:41   ` David Woodhouse
2018-01-08 22:33     ` Paolo Bonzini
2018-01-08 22:09   ` Ashok Raj
2018-01-08 22:25     ` Paolo Bonzini
2018-01-11  2:47   ` Tim Chen
2018-01-11 10:41     ` Paolo Bonzini
2018-01-08 18:08 ` [PATCH 4/7] kvm: vmx: Set IBPB when running a different VCPU Paolo Bonzini
2018-01-08 19:23   ` Liran Alon
2018-01-08 19:36   ` Jim Mattson
2018-01-09  8:33     ` Paolo Bonzini
2018-01-09 11:01   ` David Hildenbrand
2018-01-08 18:08 ` [PATCH 5/7] kvm: svm: pass MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD down to guest Paolo Bonzini
2018-01-08 19:41   ` Liran Alon
2018-01-08 18:08 ` [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU Paolo Bonzini
2018-01-08 20:00   ` Liran Alon
2018-01-09 11:07     ` Paolo Bonzini
2018-01-08 18:08 ` [PATCH 7/7] KVM: x86: add SPEC_CTRL and IBPB_SUPPORT to MSR and CPUID lists Paolo Bonzini
2018-01-08 20:07   ` Liran Alon
2018-01-08 20:15     ` Jim Mattson
2018-01-09 10:15 ` [PATCH 0/7] KVM: x86: expose CVE-2017-5715 ("Spectre variant 2") mitigations to guest Thomas Gleixner
2018-01-09 11:12   ` Paolo Bonzini
2018-01-09 12:03     ` Thomas Gleixner
2018-01-09 14:06       ` Paolo Bonzini
  -- strict thread matches above, loose matches on Subject: below --
2018-01-09 11:31 [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU Liran Alon
2018-01-09 11:41 ` Paolo Bonzini
2018-01-09 14:30   ` Arjan van de Ven
2018-01-09 15:00 Liran Alon
2018-01-09 15:19 ` Arjan van de Ven
2018-01-09 16:17   ` Paolo Bonzini
2018-01-09 16:23     ` Arjan van de Ven
2018-01-09 16:49       ` Paolo Bonzini
2018-01-09 20:39         ` Konrad Rzeszutek Wilk
2018-01-09 20:47           ` Konrad Rzeszutek Wilk
2018-01-09 20:57             ` Jim Mattson
2018-01-09 21:11               ` Konrad Rzeszutek Wilk
2018-01-09 21:19                 ` Jim Mattson
2018-01-09 21:42               ` Paolo Bonzini
2018-01-09 21:59                 ` Jim Mattson
2018-01-09 21:56           ` Paolo Bonzini
2018-01-09 15:33 Liran Alon
2018-01-09 15:56 ` Arjan van de Ven
2018-01-09 16:01 Liran Alon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox