* [PATCH 0/5] KVM: x86: support setting the CPL independent of CS
@ 2014-05-13 14:55 Paolo Bonzini
2014-05-13 14:55 ` [PATCH 1/5] KVM: x86: support KVM_ENABLE_CAP on vm file descriptors Paolo Bonzini
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: Paolo Bonzini @ 2014-05-13 14:55 UTC (permalink / raw)
To: linux-kernel; +Cc: jan.kiszka, kvm, gleb, avi.kivity
Until now, KVM used to assume that CS.RPL could always be used as the CPL
value when KVM_SET_SREGS is called. Unfortunately this is not the case.
If userspace decides to call KVM_GET_SREGS/KVM_SET_SREGS exactly after
CR0.PE has been set to 1, but before the long jump that reloads CS, the
CPL will be reset to bits 0-1 of CS (aka CS.RPL). This can work or not,
depending on the placement of the code that transitions to protected
mode. If CS.RPL != 0 the emulator will see CS.RPL != CS.DPL (the DPL
will always be zero) and fail to fetch the next instruction of the
transition code.
The same bug exists with SVM, where you don't have the emulator but the
guest will triple fault. Strangely, it doesn't occur with Intel's
unrestricted guest mode.
To trigger this using QEMU, it is enough to send "info cpus" continuously
while running iPXE (which places its code for real->protected mode in
the EBDA). iPXE does a lot of transitions, and the guest will crash
very quickly.
Avi or Gleb, this is a bit tricky. Can you review it please?
Paolo Bonzini (5):
KVM: x86: support KVM_ENABLE_CAP on vm file descriptors
KVM: x86: get CPL in KVM_GET_SREGS, prepare to be able to set it
KVM: vmx: always compute the CPL on KVM_SET_SREGS
KVM: vmx: force CPL=0 on real->protected mode transition
KVM: x86: add capability to get/set CPL
arch/x86/include/asm/kvm_host.h | 4 ++-
arch/x86/include/uapi/asm/kvm.h | 5 +++-
arch/x86/kvm/svm.c | 13 +++++++--
arch/x86/kvm/vmx.c | 62 ++++++++++++++++++++++++++---------------
arch/x86/kvm/x86.c | 53 +++++++++++++++++++++++++++--------
include/uapi/linux/kvm.h | 1 +
6 files changed, 98 insertions(+), 40 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/5] KVM: x86: support KVM_ENABLE_CAP on vm file descriptors
2014-05-13 14:55 [PATCH 0/5] KVM: x86: support setting the CPL independent of CS Paolo Bonzini
@ 2014-05-13 14:55 ` Paolo Bonzini
2014-05-13 14:55 ` [PATCH 2/5] KVM: x86: retrieve CPL in KVM_GET_SREGS, prepare to be able to set it Paolo Bonzini
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2014-05-13 14:55 UTC (permalink / raw)
To: linux-kernel; +Cc: jan.kiszka, kvm, gleb, avi.kivity
We will use it for CPL save/restore; making it an enabled capability
lets us reuse a padding field in kvm_segment instead of inventing a
new ioctl. This is also simpler because KVM_SET_SREGS already mucks
with the CPL.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/x86.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9e8c74e6a4c1..8e0532fc4d96 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2655,6 +2655,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_READONLY_MEM:
case KVM_CAP_HYPERV_TIME:
case KVM_CAP_IOAPIC_POLARITY_IGNORED:
+ case KVM_CAP_ENABLE_CAP_VM:
#ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
case KVM_CAP_ASSIGN_DEV_IRQ:
case KVM_CAP_PCI_2_3:
@@ -3673,6 +3674,21 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event,
return 0;
}
+static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
+{
+ int r;
+
+ if (cap->flags)
+ return -EINVAL;
+
+ switch (cap->cap) {
+ default:
+ r = -EINVAL;
+ break;
+ }
+ return r;
+}
+
long kvm_arch_vm_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
@@ -3709,6 +3725,14 @@ long kvm_arch_vm_ioctl(struct file *filp,
case KVM_GET_NR_MMU_PAGES:
r = kvm_vm_ioctl_get_nr_mmu_pages(kvm);
break;
+ case KVM_ENABLE_CAP: {
+ struct kvm_enable_cap cap;
+ r = -EFAULT;
+ if (copy_from_user(&cap, argp, sizeof(cap)))
+ break;
+ r = kvm_vm_ioctl_enable_cap(kvm, &cap);
+ break;
+ }
case KVM_CREATE_IRQCHIP: {
struct kvm_pic *vpic;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/5] KVM: x86: retrieve CPL in KVM_GET_SREGS, prepare to be able to set it
2014-05-13 14:55 [PATCH 0/5] KVM: x86: support setting the CPL independent of CS Paolo Bonzini
2014-05-13 14:55 ` [PATCH 1/5] KVM: x86: support KVM_ENABLE_CAP on vm file descriptors Paolo Bonzini
@ 2014-05-13 14:55 ` Paolo Bonzini
2014-05-13 14:55 ` [PATCH 3/5] KVM: vmx: always compute the CPL on KVM_SET_SREGS Paolo Bonzini
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2014-05-13 14:55 UTC (permalink / raw)
To: linux-kernel; +Cc: jan.kiszka, kvm, gleb, avi.kivity
For the set path, this patch never uses the new code so there
is no semantic change.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/include/uapi/asm/kvm.h | 5 ++++-
arch/x86/kvm/svm.c | 13 ++++++++++---
arch/x86/kvm/vmx.c | 36 +++++++++++++++++++++++-------------
arch/x86/kvm/x86.c | 24 ++++++++++++------------
5 files changed, 50 insertions(+), 30 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e21aee98a5c2..0bc2d91c8a97 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -689,7 +689,7 @@ struct kvm_x86_ops {
struct kvm_segment *var, int seg);
int (*get_cpl)(struct kvm_vcpu *vcpu);
void (*set_segment)(struct kvm_vcpu *vcpu,
- struct kvm_segment *var, int seg);
+ struct kvm_segment *var, int seg, bool set_cpl);
void (*get_cs_db_l_bits)(struct kvm_vcpu *vcpu, int *db, int *l);
void (*decache_cr0_guest_bits)(struct kvm_vcpu *vcpu);
void (*decache_cr3)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index d3a87780c70b..4e80c624d0b3 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -126,7 +126,10 @@ struct kvm_segment {
__u8 type;
__u8 present, dpl, db, s, l, g, avl;
__u8 unusable;
- __u8 padding;
+ union {
+ __u8 padding;
+ __u8 cpl;
+ };
};
struct kvm_dtable {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0b7d58d0c5fb..647e47653a9b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -203,6 +203,7 @@ module_param(nested, int, S_IRUGO);
static void svm_flush_tlb(struct kvm_vcpu *vcpu);
static void svm_complete_interrupts(struct vcpu_svm *svm);
+static int svm_get_cpl(struct kvm_vcpu *vcpu);
static int nested_svm_exit_handled(struct vcpu_svm *svm);
static int nested_svm_intercept(struct vcpu_svm *svm);
@@ -1445,6 +1446,7 @@ static void svm_get_segment(struct kvm_vcpu *vcpu,
* Intel's VMENTRY has a check on the 'G' bit.
*/
var->g = s->limit > 0xfffff;
+ var->cpl = svm_get_cpl(vcpu);
break;
case VCPU_SREG_TR:
/*
@@ -1611,7 +1613,8 @@ static int svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
}
static void svm_set_segment(struct kvm_vcpu *vcpu,
- struct kvm_segment *var, int seg)
+ struct kvm_segment *var, int seg,
+ bool set_cpl)
{
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb_seg *s = svm_seg(vcpu, seg);
@@ -1631,8 +1634,12 @@ static void svm_set_segment(struct kvm_vcpu *vcpu,
s->attrib |= (var->db & 1) << SVM_SELECTOR_DB_SHIFT;
s->attrib |= (var->g & 1) << SVM_SELECTOR_G_SHIFT;
}
- if (seg == VCPU_SREG_CS)
- svm_update_cpl(vcpu);
+ if (seg == VCPU_SREG_CS) {
+ if (set_cpl)
+ svm->vmcb->save.cpl = var->cpl;
+ else
+ svm_update_cpl(vcpu);
+ }
mark_dirty(svm->vmcb, VMCB_SEG);
}
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a56a4acda82c..10256c27694d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -733,9 +733,10 @@ static void kvm_cpu_vmxoff(void);
static bool vmx_mpx_supported(void);
static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr);
static void vmx_set_segment(struct kvm_vcpu *vcpu,
- struct kvm_segment *var, int seg);
+ struct kvm_segment *var, int seg, bool set_cpl);
static void vmx_get_segment(struct kvm_vcpu *vcpu,
struct kvm_segment *var, int seg);
+static int vmx_get_cpl(struct kvm_vcpu *vcpu);
static bool guest_state_valid(struct kvm_vcpu *vcpu);
static u32 vmx_segment_access_rights(struct kvm_segment *var);
static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
@@ -3109,7 +3110,7 @@ static void fix_pmode_seg(struct kvm_vcpu *vcpu, int seg,
save->dpl = save->selector & SELECTOR_RPL_MASK;
save->s = 1;
}
- vmx_set_segment(vcpu, save, seg);
+ vmx_set_segment(vcpu, save, seg, false);
}
static void enter_pmode(struct kvm_vcpu *vcpu)
@@ -3132,7 +3133,7 @@ static void enter_pmode(struct kvm_vcpu *vcpu)
vmx_segment_cache_clear(vmx);
- vmx_set_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_TR], VCPU_SREG_TR);
+ vmx_set_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_TR], VCPU_SREG_TR, false);
flags = vmcs_readl(GUEST_RFLAGS);
flags &= RMODE_GUEST_OWNED_EFLAGS_BITS;
@@ -3538,6 +3539,9 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu,
var->l = (ar >> 13) & 1;
var->db = (ar >> 14) & 1;
var->g = (ar >> 15) & 1;
+
+ if (seg == VCPU_SREG_CS)
+ var->cpl = vmx_get_cpl(vcpu);
}
static u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg)
@@ -3592,14 +3596,20 @@ static u32 vmx_segment_access_rights(struct kvm_segment *var)
}
static void vmx_set_segment(struct kvm_vcpu *vcpu,
- struct kvm_segment *var, int seg)
+ struct kvm_segment *var, int seg,
+ bool set_cpl)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
const struct kvm_vmx_segment_field *sf = &kvm_vmx_segment_fields[seg];
vmx_segment_cache_clear(vmx);
- if (seg == VCPU_SREG_CS)
- __clear_bit(VCPU_EXREG_CPL, (ulong *)&vcpu->arch.regs_avail);
+ if (seg == VCPU_SREG_CS) {
+ if (set_cpl) {
+ __set_bit(VCPU_EXREG_CPL, (ulong *)&vcpu->arch.regs_avail);
+ vmx->cpl = var->cpl;
+ } else
+ __clear_bit(VCPU_EXREG_CPL, (ulong *)&vcpu->arch.regs_avail);
+ }
if (vmx->rmode.vm86_active && seg != VCPU_SREG_LDTR) {
vmx->rmode.segs[seg] = *var;
@@ -8600,7 +8610,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
seg.l = 1;
else
seg.db = 1;
- vmx_set_segment(vcpu, &seg, VCPU_SREG_CS);
+ vmx_set_segment(vcpu, &seg, VCPU_SREG_CS, false);
seg = (struct kvm_segment) {
.base = 0,
.limit = 0xFFFFFFFF,
@@ -8611,17 +8621,17 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
.g = 1
};
seg.selector = vmcs12->host_ds_selector;
- vmx_set_segment(vcpu, &seg, VCPU_SREG_DS);
+ vmx_set_segment(vcpu, &seg, VCPU_SREG_DS, false);
seg.selector = vmcs12->host_es_selector;
- vmx_set_segment(vcpu, &seg, VCPU_SREG_ES);
+ vmx_set_segment(vcpu, &seg, VCPU_SREG_ES, false);
seg.selector = vmcs12->host_ss_selector;
- vmx_set_segment(vcpu, &seg, VCPU_SREG_SS);
+ vmx_set_segment(vcpu, &seg, VCPU_SREG_SS, false);
seg.selector = vmcs12->host_fs_selector;
seg.base = vmcs12->host_fs_base;
- vmx_set_segment(vcpu, &seg, VCPU_SREG_FS);
+ vmx_set_segment(vcpu, &seg, VCPU_SREG_FS, false);
seg.selector = vmcs12->host_gs_selector;
seg.base = vmcs12->host_gs_base;
- vmx_set_segment(vcpu, &seg, VCPU_SREG_GS);
+ vmx_set_segment(vcpu, &seg, VCPU_SREG_GS, false);
seg = (struct kvm_segment) {
.base = vmcs12->host_tr_base,
.limit = 0x67,
@@ -8629,7 +8639,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
.type = 11,
.present = 1
};
- vmx_set_segment(vcpu, &seg, VCPU_SREG_TR);
+ vmx_set_segment(vcpu, &seg, VCPU_SREG_TR, false);
kvm_set_dr(vcpu, 7, 0x400);
vmcs_write64(GUEST_IA32_DEBUGCTL, 0);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8e0532fc4d96..ca0a1d38fa51 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4033,9 +4033,9 @@ static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *v)
}
static void kvm_set_segment(struct kvm_vcpu *vcpu,
- struct kvm_segment *var, int seg)
+ struct kvm_segment *var, int seg, int set_cpl)
{
- kvm_x86_ops->set_segment(vcpu, var, seg);
+ kvm_x86_ops->set_segment(vcpu, var, seg, set_cpl);
}
void kvm_get_segment(struct kvm_vcpu *vcpu,
@@ -4848,7 +4848,7 @@ static void emulator_set_segment(struct x86_emulate_ctxt *ctxt, u16 selector,
var.unusable = !var.present;
var.padding = 0;
- kvm_set_segment(vcpu, &var, seg);
+ kvm_set_segment(vcpu, &var, seg, false);
return;
}
@@ -6678,15 +6678,15 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
pr_debug("Set back pending irq %d\n", pending_vec);
}
- kvm_set_segment(vcpu, &sregs->cs, VCPU_SREG_CS);
- kvm_set_segment(vcpu, &sregs->ds, VCPU_SREG_DS);
- kvm_set_segment(vcpu, &sregs->es, VCPU_SREG_ES);
- kvm_set_segment(vcpu, &sregs->fs, VCPU_SREG_FS);
- kvm_set_segment(vcpu, &sregs->gs, VCPU_SREG_GS);
- kvm_set_segment(vcpu, &sregs->ss, VCPU_SREG_SS);
+ kvm_set_segment(vcpu, &sregs->cs, VCPU_SREG_CS, false);
+ kvm_set_segment(vcpu, &sregs->ds, VCPU_SREG_DS, false);
+ kvm_set_segment(vcpu, &sregs->es, VCPU_SREG_ES, false);
+ kvm_set_segment(vcpu, &sregs->fs, VCPU_SREG_FS, false);
+ kvm_set_segment(vcpu, &sregs->gs, VCPU_SREG_GS, false);
+ kvm_set_segment(vcpu, &sregs->ss, VCPU_SREG_SS, false);
- kvm_set_segment(vcpu, &sregs->tr, VCPU_SREG_TR);
- kvm_set_segment(vcpu, &sregs->ldt, VCPU_SREG_LDTR);
+ kvm_set_segment(vcpu, &sregs->tr, VCPU_SREG_TR, false);
+ kvm_set_segment(vcpu, &sregs->ldt, VCPU_SREG_LDTR, false);
update_cr8_intercept(vcpu);
@@ -6976,7 +6976,7 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, unsigned int vector)
kvm_get_segment(vcpu, &cs, VCPU_SREG_CS);
cs.selector = vector << 8;
cs.base = vector << 12;
- kvm_set_segment(vcpu, &cs, VCPU_SREG_CS);
+ kvm_set_segment(vcpu, &cs, VCPU_SREG_CS, false);
kvm_rip_write(vcpu, 0);
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/5] KVM: vmx: always compute the CPL on KVM_SET_SREGS
2014-05-13 14:55 [PATCH 0/5] KVM: x86: support setting the CPL independent of CS Paolo Bonzini
2014-05-13 14:55 ` [PATCH 1/5] KVM: x86: support KVM_ENABLE_CAP on vm file descriptors Paolo Bonzini
2014-05-13 14:55 ` [PATCH 2/5] KVM: x86: retrieve CPL in KVM_GET_SREGS, prepare to be able to set it Paolo Bonzini
@ 2014-05-13 14:55 ` Paolo Bonzini
2014-05-13 14:55 ` [PATCH 4/5] KVM: vmx: force CPL=0 on real->protected mode transition Paolo Bonzini
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2014-05-13 14:55 UTC (permalink / raw)
To: linux-kernel; +Cc: jan.kiszka, kvm, gleb, avi.kivity
It is not obvious that the CPL will be guessed right the next time
we run vmx_get_cpl. Make the code similar to SVM, and do the "guess"
already during vmx_set_segment while we have the correct value of CR0.PE.
For unrestricted guest mode, there is still a small window where CR0.PE
and we could set the CS selector to a real mode value. It is not clear
how the processor could know the correct CPL on the next vmentry in this
case; the only way to solve it would be to force invalid guest state
in that case and emulate protected mode entry until the long jump.
Do not do this yet, since unrestricted guest seems to work with the
test case that prompted this change.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/vmx.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 10256c27694d..7dc5fdd30d7f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3603,14 +3603,6 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu,
const struct kvm_vmx_segment_field *sf = &kvm_vmx_segment_fields[seg];
vmx_segment_cache_clear(vmx);
- if (seg == VCPU_SREG_CS) {
- if (set_cpl) {
- __set_bit(VCPU_EXREG_CPL, (ulong *)&vcpu->arch.regs_avail);
- vmx->cpl = var->cpl;
- } else
- __clear_bit(VCPU_EXREG_CPL, (ulong *)&vcpu->arch.regs_avail);
- }
-
if (vmx->rmode.vm86_active && seg != VCPU_SREG_LDTR) {
vmx->rmode.segs[seg] = *var;
if (seg == VCPU_SREG_TR)
@@ -3640,6 +3632,14 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu,
vmcs_write32(sf->ar_bytes, vmx_segment_access_rights(var));
+ if (seg == VCPU_SREG_CS) {
+ __set_bit(VCPU_EXREG_CPL, (ulong *)&vcpu->arch.regs_avail);
+ if (set_cpl)
+ vmx->cpl = var->cpl;
+ else
+ vmx->cpl = vmx_get_cpl(vcpu);
+ }
+
out:
vmx->emulation_required = emulation_required(vcpu);
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 4/5] KVM: vmx: force CPL=0 on real->protected mode transition
2014-05-13 14:55 [PATCH 0/5] KVM: x86: support setting the CPL independent of CS Paolo Bonzini
` (2 preceding siblings ...)
2014-05-13 14:55 ` [PATCH 3/5] KVM: vmx: always compute the CPL on KVM_SET_SREGS Paolo Bonzini
@ 2014-05-13 14:55 ` Paolo Bonzini
2014-05-13 14:55 ` [PATCH 5/5] KVM: x86: add capability to get/set CPL Paolo Bonzini
2014-05-14 7:47 ` [PATCH 0/5] KVM: x86: support setting the CPL independent of CS Paolo Bonzini
5 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2014-05-13 14:55 UTC (permalink / raw)
To: linux-kernel; +Cc: jan.kiszka, kvm, gleb, avi.kivity
When writing 1 to CR0.PE, the CPL remains 0 even if bits 0-1 of CS
disagree. Before calling vmx_set_cr0, VCPU_EXREG_CPL was cleared
by vmx_vcpu_run, so set it again and make sure that enter_pmode
does not change it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/vmx.c | 26 ++++++++++++++++----------
1 file changed, 16 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 7dc5fdd30d7f..7440cce3bf30 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3102,7 +3102,7 @@ static void fix_pmode_seg(struct kvm_vcpu *vcpu, int seg,
* CS and SS RPL should be equal during guest entry according
* to VMX spec, but in reality it is not always so. Since vcpu
* is in the middle of the transition from real mode to
- * protected mode it is safe to assume that RPL 0 is a good
+ * protected mode, the CPL is 0; thus RPL 0 is a good
* default value.
*/
if (seg == VCPU_SREG_CS || seg == VCPU_SREG_SS)
@@ -3110,7 +3110,7 @@ static void fix_pmode_seg(struct kvm_vcpu *vcpu, int seg,
save->dpl = save->selector & SELECTOR_RPL_MASK;
save->s = 1;
}
- vmx_set_segment(vcpu, save, seg, false);
+ vmx_set_segment(vcpu, save, seg, true);
}
static void enter_pmode(struct kvm_vcpu *vcpu)
@@ -3151,10 +3151,6 @@ static void enter_pmode(struct kvm_vcpu *vcpu)
fix_pmode_seg(vcpu, VCPU_SREG_DS, &vmx->rmode.segs[VCPU_SREG_DS]);
fix_pmode_seg(vcpu, VCPU_SREG_FS, &vmx->rmode.segs[VCPU_SREG_FS]);
fix_pmode_seg(vcpu, VCPU_SREG_GS, &vmx->rmode.segs[VCPU_SREG_GS]);
-
- /* CPL is always 0 when CPU enters protected mode */
- __set_bit(VCPU_EXREG_CPL, (ulong *)&vcpu->arch.regs_avail);
- vmx->cpl = 0;
}
static void fix_rmode_seg(int seg, struct kvm_segment *save)
@@ -3397,11 +3393,21 @@ static void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
else {
hw_cr0 |= KVM_VM_CR0_ALWAYS_ON;
- if (vmx->rmode.vm86_active && (cr0 & X86_CR0_PE))
- enter_pmode(vcpu);
+ if (cr0 & X86_CR0_PE) {
+ /*
+ * CPL is always 0 when CPU enters protected
+ * mode, bits 0-1 of CS do not matter.
+ */
+ __set_bit(VCPU_EXREG_CPL,
+ (ulong *)&vcpu->arch.regs_avail);
+ vmx->cpl = 0;
- if (!vmx->rmode.vm86_active && !(cr0 & X86_CR0_PE))
- enter_rmode(vcpu);
+ if (vmx->rmode.vm86_active)
+ enter_pmode(vcpu);
+ } else {
+ if (!vmx->rmode.vm86_active)
+ enter_rmode(vcpu);
+ }
}
#ifdef CONFIG_X86_64
--
1.8.3.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 5/5] KVM: x86: add capability to get/set CPL
2014-05-13 14:55 [PATCH 0/5] KVM: x86: support setting the CPL independent of CS Paolo Bonzini
` (3 preceding siblings ...)
2014-05-13 14:55 ` [PATCH 4/5] KVM: vmx: force CPL=0 on real->protected mode transition Paolo Bonzini
@ 2014-05-13 14:55 ` Paolo Bonzini
2014-05-14 7:47 ` [PATCH 0/5] KVM: x86: support setting the CPL independent of CS Paolo Bonzini
5 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2014-05-13 14:55 UTC (permalink / raw)
To: linux-kernel; +Cc: jan.kiszka, kvm, gleb, avi.kivity
Until now, KVM used to assume that CS.RPL could always be used as the CPL
value when KVM_SET_SREGS is called. Unfortunately this is not the case.
If userspace decides to call KVM_GET_SREGS/KVM_SET_SREGS exactly after
CR0.PE has been set to 1, but before the long jump that reloads CS, the
CPL will be reset to bits 0-1 of CS (aka CS.RPL). This can work or not,
depending on the placement of the code that transitions to protected
mode. If CS.RPL != 0 the emulator will see CS.RPL != CS.DPL (the DPL
will always be zero) and fail to fetch the next instruction of the
transition code.
To trigger this using QEMU, it is enough to send "info cpus" continuously
while running iPXE (which places its code for real->protected mode in
the EBDA). iPXE does a lot of transitions, and the guest will crash
very quickly.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/x86.c | 7 ++++++-
include/uapi/linux/kvm.h | 1 +
3 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0bc2d91c8a97..5a85423f4e65 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -574,6 +574,8 @@ struct kvm_arch {
struct mutex apic_map_lock;
struct kvm_apic_map *apic_map;
+ bool set_cpl;
+
unsigned int tss_addr;
struct page *apic_access_page;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ca0a1d38fa51..94c6c77e7a9f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2656,6 +2656,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_HYPERV_TIME:
case KVM_CAP_IOAPIC_POLARITY_IGNORED:
case KVM_CAP_ENABLE_CAP_VM:
+ case KVM_CAP_X86_CPL:
#ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
case KVM_CAP_ASSIGN_DEV_IRQ:
case KVM_CAP_PCI_2_3:
@@ -3682,6 +3683,10 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
return -EINVAL;
switch (cap->cap) {
+ case KVM_CAP_X86_CPL:
+ kvm->arch.set_cpl = 1;
+ r = 0;
+ break;
default:
r = -EINVAL;
break;
@@ -6678,7 +6683,7 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
pr_debug("Set back pending irq %d\n", pending_vec);
}
- kvm_set_segment(vcpu, &sregs->cs, VCPU_SREG_CS, false);
+ kvm_set_segment(vcpu, &sregs->cs, VCPU_SREG_CS, vcpu->kvm->arch.set_cpl);
kvm_set_segment(vcpu, &sregs->ds, VCPU_SREG_DS, false);
kvm_set_segment(vcpu, &sregs->es, VCPU_SREG_ES, false);
kvm_set_segment(vcpu, &sregs->fs, VCPU_SREG_FS, false);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2b83cf35437a..4bcf34aa1b3b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -748,6 +748,7 @@ struct kvm_ppc_smmu_info {
#define KVM_CAP_S390_IRQCHIP 99
#define KVM_CAP_IOEVENTFD_NO_LENGTH 100
#define KVM_CAP_VM_ATTRIBUTES 101
+#define KVM_CAP_X86_CPL 102
#ifdef KVM_CAP_IRQ_ROUTING
--
1.8.3.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 0/5] KVM: x86: support setting the CPL independent of CS
2014-05-13 14:55 [PATCH 0/5] KVM: x86: support setting the CPL independent of CS Paolo Bonzini
` (4 preceding siblings ...)
2014-05-13 14:55 ` [PATCH 5/5] KVM: x86: add capability to get/set CPL Paolo Bonzini
@ 2014-05-14 7:47 ` Paolo Bonzini
5 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2014-05-14 7:47 UTC (permalink / raw)
To: linux-kernel; +Cc: jan.kiszka, kvm, gleb, avi.kivity
Il 13/05/2014 16:55, Paolo Bonzini ha scritto:
> Until now, KVM used to assume that CS.RPL could always be used as the CPL
> value when KVM_SET_SREGS is called. Unfortunately this is not the case.
> If userspace decides to call KVM_GET_SREGS/KVM_SET_SREGS exactly after
> CR0.PE has been set to 1, but before the long jump that reloads CS, the
> CPL will be reset to bits 0-1 of CS (aka CS.RPL). This can work or not,
> depending on the placement of the code that transitions to protected
> mode. If CS.RPL != 0 the emulator will see CS.RPL != CS.DPL (the DPL
> will always be zero) and fail to fetch the next instruction of the
> transition code.
>
> The same bug exists with SVM, where you don't have the emulator but the
> guest will triple fault. Strangely, it doesn't occur with Intel's
> unrestricted guest mode.
>
> To trigger this using QEMU, it is enough to send "info cpus" continuously
> while running iPXE (which places its code for real->protected mode in
> the EBDA). iPXE does a lot of transitions, and the guest will crash
> very quickly.
>
> Avi or Gleb, this is a bit tricky. Can you review it please?
There's a simpler way to fix this, by using SS.DPL as the CPL. Patch on
its way...
Paolo
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-05-14 7:48 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-13 14:55 [PATCH 0/5] KVM: x86: support setting the CPL independent of CS Paolo Bonzini
2014-05-13 14:55 ` [PATCH 1/5] KVM: x86: support KVM_ENABLE_CAP on vm file descriptors Paolo Bonzini
2014-05-13 14:55 ` [PATCH 2/5] KVM: x86: retrieve CPL in KVM_GET_SREGS, prepare to be able to set it Paolo Bonzini
2014-05-13 14:55 ` [PATCH 3/5] KVM: vmx: always compute the CPL on KVM_SET_SREGS Paolo Bonzini
2014-05-13 14:55 ` [PATCH 4/5] KVM: vmx: force CPL=0 on real->protected mode transition Paolo Bonzini
2014-05-13 14:55 ` [PATCH 5/5] KVM: x86: add capability to get/set CPL Paolo Bonzini
2014-05-14 7:47 ` [PATCH 0/5] KVM: x86: support setting the CPL independent of CS Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).