[PATCH v8 0/3] x86, apicv: Add APIC virtualization support

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v8 0/3] x86, apicv: Add APIC virtualization support
@ 2013-01-07  2:02 Yang Zhang
  2013-01-07  2:02 ` [PATCH v8 1/3] x86, apicv: add APICv register " Yang Zhang
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Yang Zhang @ 2013-01-07  2:02 UTC (permalink / raw)
  To: kvm; +Cc: gleb, haitao.shan, mtosatti, Yang Zhang

From: Yang Zhang <yang.z.zhang@Intel.com>

APIC virtualization is a new feature which can eliminate most of VM exit
when vcpu handle a interrupt:

APIC register virtualization:
        APIC read access doesn't cause APIC-access VM exits.
        APIC write becomes trap-like.

Virtual interrupt delivery:
        Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
        manually, which is fully taken care of by the hardware.

Please refer to Intel SDM volume 3, chapter 29 for more details.
Changes v7 to v8:
 * According Marcelo's suggestion, add comments for irr_pending and isr_count,
   since the two valiables have different meaning when using apicv.
 * Set highest bit in vISR to SVI after migation.
 * Use spinlock to access eoi exit bitmap synchronously.
 * Enable virtualize x2apic mode when guest is using x2apic
 * Rebased on top of KVM upstream.

Changes v6 to v7:
 * fix a bug when set exit bitmap.
 * Rebased on top of KVM upstream.

Changes v5 to v6:
 * minor adjustments according gleb's comments
 * Rebased on top of KVM upstream.

Changes v4 to v5:
 * Set eoi exit bitmap when an interrupt has notifier registered.
 * Use request to track ioapic entry's modification.
 * Rebased on top of KVM upstream.

Changes v3 to v4:
 * use one option to control both register virtualization and virtual interrupt
   delivery.
 * Update eoi exit bitmap when programing ioapic or programing apic's id/dfr/ldr.
 * Rebased on top of KVM upstream.

Changes v2 to v3:
 * Drop Posted Interrupt patch from v3.
   According Gleb's suggestion, we will use global vector for all VCPUs as notification
   event vector. So we will rewrite the Posted Interrupt patch. And resend it later.
 * Use TMR to set the eoi exiting bitmap. We only want to set eoi exiting bitmap for
   those interrupt which is level trigger or has notifier in EOI write path. So TMR is
   enough to distinguish the interrupt trigger mode.
 * Simplify some code according Gleb's comments.
 * rebased on top of KVM upstream.

Changes v1 to v2:
 * Add Posted Interrupt support in this series patch.
 * Since there is a notifer hook in vAPIC EOI for PIT interrupt. So always Set PIT
   interrupt in eoi exit bitmap to force vmexit when EOI to interrupt.
 * Rebased on top of KVM upstream

Yang Zhang (3):
  x86, apicv: add APICv register virtualization support
  x86, apicv: add virtual interrupt delivery support
  x86, apicv: add virtual x2apic support

 arch/ia64/kvm/lapic.h           |    6 +
 arch/x86/include/asm/kvm_host.h |    9 ++
 arch/x86/include/asm/vmx.h      |   14 ++
 arch/x86/kvm/irq.c              |   56 +++++++-
 arch/x86/kvm/lapic.c            |  104 ++++++++++-----
 arch/x86/kvm/lapic.h            |   31 ++++-
 arch/x86/kvm/svm.c              |   42 ++++++
 arch/x86/kvm/vmx.c              |  295 +++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c              |   11 ++-
 include/linux/kvm_host.h        |    2 +
 virt/kvm/ioapic.c               |   41 ++++++
 virt/kvm/ioapic.h               |    1 +
 virt/kvm/irq_comm.c             |   20 +++
 13 files changed, 581 insertions(+), 51 deletions(-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v8 1/3] x86, apicv: add APICv register virtualization support
  2013-01-07  2:02 [PATCH v8 0/3] x86, apicv: Add APIC virtualization support Yang Zhang
@ 2013-01-07  2:02 ` Yang Zhang
  2013-01-07  2:02 ` [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support Yang Zhang
  2013-01-07  2:02 ` [PATCH v8 3/3] x86, apicv: add virtual x2apic support Yang Zhang
  2 siblings, 0 replies; 22+ messages in thread
From: Yang Zhang @ 2013-01-07  2:02 UTC (permalink / raw)
  To: kvm; +Cc: gleb, haitao.shan, mtosatti, Yang Zhang, Kevin Tian

- APIC read doesn't cause VM-Exit
- APIC write becomes trap-like

Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
---
 arch/x86/include/asm/vmx.h |    2 ++
 arch/x86/kvm/lapic.c       |   15 +++++++++++++++
 arch/x86/kvm/lapic.h       |    2 ++
 arch/x86/kvm/vmx.c         |   33 ++++++++++++++++++++++++++++++++-
 4 files changed, 51 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index e385df9..44c3f7e 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -66,6 +66,7 @@
 #define EXIT_REASON_EPT_MISCONFIG       49
 #define EXIT_REASON_WBINVD              54
 #define EXIT_REASON_XSETBV              55
+#define EXIT_REASON_APIC_WRITE          56
 #define EXIT_REASON_INVPCID             58
 
 #define VMX_EXIT_REASONS \
@@ -141,6 +142,7 @@
 #define SECONDARY_EXEC_ENABLE_VPID              0x00000020
 #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040
 #define SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
+#define SECONDARY_EXEC_APIC_REGISTER_VIRT       0x00000100
 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400
 #define SECONDARY_EXEC_ENABLE_INVPCID		0x00001000
 
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9392f52..0664c13 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1212,6 +1212,21 @@ void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi);
 
+/* emulate APIC access in a trap manner */
+void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
+{
+	u32 val = 0;
+
+	/* hw has done the conditional check and inst decode */
+	offset &= 0xff0;
+
+	apic_reg_read(vcpu->arch.apic, offset, 4, &val);
+
+	/* TODO: optimize to just emulate side effect w/o one more write */
+	apic_reg_write(vcpu->arch.apic, offset, val);
+}
+EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode);
+
 void kvm_free_lapic(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index e5ebf9f..9a8ee22 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -64,6 +64,8 @@ int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
 u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
 void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
 
+void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
+
 void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr);
 void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
 void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 23d5aec..730dc13 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -84,6 +84,9 @@ module_param(vmm_exclusive, bool, S_IRUGO);
 static bool __read_mostly fasteoi = 1;
 module_param(fasteoi, bool, S_IRUGO);
 
+static bool __read_mostly enable_apicv_reg_vid = 1;
+module_param(enable_apicv_reg_vid, bool, S_IRUGO);
+
 /*
  * If nested=1, nested virtualization is supported, i.e., guests may use
  * VMX and be a hypervisor for its own guests. If nested=0, guests may not
@@ -762,6 +765,12 @@ static inline bool cpu_has_vmx_virtualize_apic_accesses(void)
 		SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
 }
 
+static inline bool cpu_has_vmx_apic_register_virt(void)
+{
+	return vmcs_config.cpu_based_2nd_exec_ctrl &
+		SECONDARY_EXEC_APIC_REGISTER_VIRT;
+}
+
 static inline bool cpu_has_vmx_flexpriority(void)
 {
 	return cpu_has_vmx_tpr_shadow() &&
@@ -2539,7 +2548,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 			SECONDARY_EXEC_UNRESTRICTED_GUEST |
 			SECONDARY_EXEC_PAUSE_LOOP_EXITING |
 			SECONDARY_EXEC_RDTSCP |
-			SECONDARY_EXEC_ENABLE_INVPCID;
+			SECONDARY_EXEC_ENABLE_INVPCID |
+			SECONDARY_EXEC_APIC_REGISTER_VIRT;
 		if (adjust_vmx_controls(min2, opt2,
 					MSR_IA32_VMX_PROCBASED_CTLS2,
 					&_cpu_based_2nd_exec_control) < 0)
@@ -2550,6 +2560,11 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 				SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES))
 		_cpu_based_exec_control &= ~CPU_BASED_TPR_SHADOW;
 #endif
+
+	if (!(_cpu_based_exec_control & CPU_BASED_TPR_SHADOW))
+		_cpu_based_2nd_exec_control &= ~(
+				SECONDARY_EXEC_APIC_REGISTER_VIRT);
+
 	if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) {
 		/* CR3 accesses and invlpg don't need to cause VM Exits when EPT
 		   enabled */
@@ -2747,6 +2762,9 @@ static __init int hardware_setup(void)
 	if (!cpu_has_vmx_ple())
 		ple_gap = 0;
 
+	if (!cpu_has_vmx_apic_register_virt())
+		enable_apicv_reg_vid = 0;
+
 	if (nested)
 		nested_vmx_setup_ctls_msrs();
 
@@ -3826,6 +3844,8 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
 		exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;
 	if (!ple_gap)
 		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
+	if (!enable_apicv_reg_vid)
+		exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
 	return exec_control;
 }
 
@@ -4785,6 +4805,16 @@ static int handle_apic_access(struct kvm_vcpu *vcpu)
 	return emulate_instruction(vcpu, 0) == EMULATE_DONE;
 }
 
+static int handle_apic_write(struct kvm_vcpu *vcpu)
+{
+	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
+	u32 offset = exit_qualification & 0xfff;
+
+	/* APIC-write VM exit is trap-like and thus no need to adjust IP */
+	kvm_apic_write_nodecode(vcpu, offset);
+	return 1;
+}
+
 static int handle_task_switch(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -5719,6 +5749,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[EXIT_REASON_VMON]                    = handle_vmon,
 	[EXIT_REASON_TPR_BELOW_THRESHOLD]     = handle_tpr_below_threshold,
 	[EXIT_REASON_APIC_ACCESS]             = handle_apic_access,
+	[EXIT_REASON_APIC_WRITE]              = handle_apic_write,
 	[EXIT_REASON_WBINVD]                  = handle_wbinvd,
 	[EXIT_REASON_XSETBV]                  = handle_xsetbv,
 	[EXIT_REASON_TASK_SWITCH]             = handle_task_switch,
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
  2013-01-07  2:02 [PATCH v8 0/3] x86, apicv: Add APIC virtualization support Yang Zhang
  2013-01-07  2:02 ` [PATCH v8 1/3] x86, apicv: add APICv register " Yang Zhang
@ 2013-01-07  2:02 ` Yang Zhang
  2013-01-07  7:51   ` Gleb Natapov
  2013-01-07 13:52   ` Marcelo Tosatti
  2013-01-07  2:02 ` [PATCH v8 3/3] x86, apicv: add virtual x2apic support Yang Zhang
  2 siblings, 2 replies; 22+ messages in thread
From: Yang Zhang @ 2013-01-07  2:02 UTC (permalink / raw)
  To: kvm; +Cc: gleb, haitao.shan, mtosatti, Yang Zhang, Kevin Tian, Yang Zhang

From: Yang Zhang <yang.z.zhang@Intel.com>

Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
manually, which is fully taken care of by the hardware. This needs
some special awareness into existing interrupr injection path:

- for pending interrupt, instead of direct injection, we may need
  update architecture specific indicators before resuming to guest.

- A pending interrupt, which is masked by ISR, should be also
  considered in above update action, since hardware will decide
  when to inject it at right time. Current has_interrupt and
  get_interrupt only returns a valid vector from injection p.o.v.

Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
---
 arch/ia64/kvm/lapic.h           |    6 ++
 arch/x86/include/asm/kvm_host.h |    8 ++
 arch/x86/include/asm/vmx.h      |   11 +++
 arch/x86/kvm/irq.c              |   56 +++++++++++-
 arch/x86/kvm/lapic.c            |   87 +++++++++++-------
 arch/x86/kvm/lapic.h            |   29 +++++-
 arch/x86/kvm/svm.c              |   36 ++++++++
 arch/x86/kvm/vmx.c              |  190 ++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c              |   11 ++-
 include/linux/kvm_host.h        |    2 +
 virt/kvm/ioapic.c               |   41 +++++++++
 virt/kvm/ioapic.h               |    1 +
 virt/kvm/irq_comm.c             |   20 ++++
 13 files changed, 451 insertions(+), 47 deletions(-)

diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h
index c5f92a9..cb59eb4 100644
--- a/arch/ia64/kvm/lapic.h
+++ b/arch/ia64/kvm/lapic.h
@@ -27,4 +27,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);
 #define kvm_apic_present(x) (true)
 #define kvm_lapic_enabled(x) (true)
 
+static inline void kvm_update_eoi_exitmap(struct kvm *kvm,
+					struct kvm_lapic_irq *irq)
+{
+	/* IA64 has no apicv supporting, do nothing here */
+}
+
 #endif
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c431b33..135603f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -697,6 +697,13 @@ struct kvm_x86_ops {
 	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
 	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
 	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
+	int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
+	void (*update_apic_irq)(struct kvm_vcpu *vcpu, int max_irr);
+	void (*update_eoi_exitmap)(struct kvm *kvm, struct kvm_lapic_irq *irq);
+	void (*update_exitmap_start)(struct kvm_vcpu *vcpu);
+	void (*update_exitmap_end)(struct kvm_vcpu *vcpu);
+	void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu);
+	void (*restore_rvi)(struct kvm_vcpu *vcpu);
 	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
 	int (*get_tdp_level)(void);
 	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
@@ -991,6 +998,7 @@ int kvm_age_hva(struct kvm *kvm, unsigned long hva);
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
+int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
 int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 44c3f7e..d1ab331 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -62,6 +62,7 @@
 #define EXIT_REASON_MCE_DURING_VMENTRY  41
 #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
 #define EXIT_REASON_APIC_ACCESS         44
+#define EXIT_REASON_EOI_INDUCED         45
 #define EXIT_REASON_EPT_VIOLATION       48
 #define EXIT_REASON_EPT_MISCONFIG       49
 #define EXIT_REASON_WBINVD              54
@@ -143,6 +144,7 @@
 #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040
 #define SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
 #define SECONDARY_EXEC_APIC_REGISTER_VIRT       0x00000100
+#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200
 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400
 #define SECONDARY_EXEC_ENABLE_INVPCID		0x00001000
 
@@ -180,6 +182,7 @@ enum vmcs_field {
 	GUEST_GS_SELECTOR               = 0x0000080a,
 	GUEST_LDTR_SELECTOR             = 0x0000080c,
 	GUEST_TR_SELECTOR               = 0x0000080e,
+	GUEST_INTR_STATUS               = 0x00000810,
 	HOST_ES_SELECTOR                = 0x00000c00,
 	HOST_CS_SELECTOR                = 0x00000c02,
 	HOST_SS_SELECTOR                = 0x00000c04,
@@ -207,6 +210,14 @@ enum vmcs_field {
 	APIC_ACCESS_ADDR_HIGH		= 0x00002015,
 	EPT_POINTER                     = 0x0000201a,
 	EPT_POINTER_HIGH                = 0x0000201b,
+	EOI_EXIT_BITMAP0                = 0x0000201c,
+	EOI_EXIT_BITMAP0_HIGH           = 0x0000201d,
+	EOI_EXIT_BITMAP1                = 0x0000201e,
+	EOI_EXIT_BITMAP1_HIGH           = 0x0000201f,
+	EOI_EXIT_BITMAP2                = 0x00002020,
+	EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
+	EOI_EXIT_BITMAP3                = 0x00002022,
+	EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
 	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
 	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
 	VMCS_LINK_POINTER               = 0x00002800,
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index b111aee..e113440 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -38,6 +38,38 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
 EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
 
 /*
+ * check if there is pending interrupt from
+ * non-APIC source without intack.
+ */
+static int kvm_cpu_has_extint(struct kvm_vcpu *v)
+{
+	if (kvm_apic_accept_pic_intr(v))
+		return pic_irqchip(v->kvm)->output;	/* PIC */
+	else
+		return 0;
+}
+
+/*
+ * check if there is injectable interrupt:
+ * when virtual interrupt delivery enabled,
+ * interrupt from apic will handled by hardware,
+ * we don't need to check it here.
+ */
+int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v)
+{
+	if (!irqchip_in_kernel(v->kvm))
+		return v->arch.interrupt.pending;
+
+	if (kvm_cpu_has_extint(v))
+		return 1;
+
+	if (kvm_apic_vid_enabled(v))
+		return 0;
+
+	return kvm_apic_has_interrupt(v) != -1; /* LAPIC */
+}
+
+/*
  * check if there is pending interrupt without
  * intack.
  */
@@ -46,27 +78,41 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
 	if (!irqchip_in_kernel(v->kvm))
 		return v->arch.interrupt.pending;
 
-	if (kvm_apic_accept_pic_intr(v) && pic_irqchip(v->kvm)->output)
-		return pic_irqchip(v->kvm)->output;	/* PIC */
+	if (kvm_cpu_has_extint(v))
+		return 1;
 
 	return kvm_apic_has_interrupt(v) != -1;	/* LAPIC */
 }
 EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
 
 /*
+ * Read pending interrupt(from non-APIC source)
+ * vector and intack.
+ */
+static int kvm_cpu_get_extint(struct kvm_vcpu *v)
+{
+	if (kvm_cpu_has_extint(v))
+		return kvm_pic_read_irq(v->kvm); /* PIC */
+	return -1;
+}
+
+/*
  * Read pending interrupt vector and intack.
  */
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
 {
+	int vector;
+
 	if (!irqchip_in_kernel(v->kvm))
 		return v->arch.interrupt.nr;
 
-	if (kvm_apic_accept_pic_intr(v) && pic_irqchip(v->kvm)->output)
-		return kvm_pic_read_irq(v->kvm);	/* PIC */
+	vector = kvm_cpu_get_extint(v);
+
+	if (kvm_apic_vid_enabled(v) || vector != -1)
+		return vector;			/* PIC */
 
 	return kvm_get_apic_interrupt(v);	/* APIC */
 }
-EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
 
 void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0664c13..e1baf37 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -133,6 +133,12 @@ static inline int apic_enabled(struct kvm_lapic *apic)
 	return kvm_apic_sw_enabled(apic) &&	kvm_apic_hw_enabled(apic);
 }
 
+bool kvm_apic_present(struct kvm_vcpu *vcpu)
+{
+	return kvm_vcpu_has_lapic(vcpu) && kvm_apic_hw_enabled(vcpu->arch.apic);
+}
+EXPORT_SYMBOL_GPL(kvm_apic_present);
+
 #define LVT_MASK	\
 	(APIC_LVT_MASKED | APIC_SEND_PENDING | APIC_VECTOR_MASK)
 
@@ -150,23 +156,6 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
 	return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
 }
 
-static inline u16 apic_cluster_id(struct kvm_apic_map *map, u32 ldr)
-{
-	u16 cid;
-	ldr >>= 32 - map->ldr_bits;
-	cid = (ldr >> map->cid_shift) & map->cid_mask;
-
-	BUG_ON(cid >= ARRAY_SIZE(map->logical_map));
-
-	return cid;
-}
-
-static inline u16 apic_logical_id(struct kvm_apic_map *map, u32 ldr)
-{
-	ldr >>= (32 - map->ldr_bits);
-	return ldr & map->lid_mask;
-}
-
 static void recalculate_apic_map(struct kvm *kvm)
 {
 	struct kvm_apic_map *new, *old = NULL;
@@ -236,12 +225,14 @@ static inline void kvm_apic_set_id(struct kvm_lapic *apic, u8 id)
 {
 	apic_set_reg(apic, APIC_ID, id << 24);
 	recalculate_apic_map(apic->vcpu->kvm);
+	ioapic_update_eoi_exitmap(apic->vcpu->kvm);
 }
 
 static inline void kvm_apic_set_ldr(struct kvm_lapic *apic, u32 id)
 {
 	apic_set_reg(apic, APIC_LDR, id);
 	recalculate_apic_map(apic->vcpu->kvm);
+	ioapic_update_eoi_exitmap(apic->vcpu->kvm);
 }
 
 static inline int apic_lvt_enabled(struct kvm_lapic *apic, int lvt_type)
@@ -345,6 +336,9 @@ static inline int apic_find_highest_irr(struct kvm_lapic *apic)
 {
 	int result;
 
+	/* Note that irr_pending is just a hint. In the platform
+	 * that has virtual interrupt delivery supporting, virr
+	 * is cleared by hardware without updating irr_pending. */
 	if (!apic->irr_pending)
 		return -1;
 
@@ -458,9 +452,13 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
 	__clear_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
 }
 
-static inline int apic_find_highest_isr(struct kvm_lapic *apic)
+int kvm_apic_find_highest_isr(struct kvm_lapic *apic)
 {
 	int result;
+
+	/* Note that isr_count is just a hint. In the platform
+	 * that has virtual interrupt delivery supporting, visr
+	 * will be set by hardware without updating irr_pending. */
 	if (!apic->isr_count)
 		return -1;
 	if (likely(apic->highest_isr_cache != -1))
@@ -471,6 +469,7 @@ static inline int apic_find_highest_isr(struct kvm_lapic *apic)
 
 	return result;
 }
+EXPORT_SYMBOL_GPL(kvm_apic_find_highest_isr);
 
 static void apic_update_ppr(struct kvm_lapic *apic)
 {
@@ -479,7 +478,7 @@ static void apic_update_ppr(struct kvm_lapic *apic)
 
 	old_ppr = kvm_apic_get_reg(apic, APIC_PROCPRI);
 	tpr = kvm_apic_get_reg(apic, APIC_TASKPRI);
-	isr = apic_find_highest_isr(apic);
+	isr = kvm_apic_find_highest_isr(apic);
 	isrv = (isr != -1) ? isr : 0;
 
 	if ((tpr & 0xf0) >= (isrv & 0xf0))
@@ -740,9 +739,22 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
 	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
 }
 
+static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
+{
+	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
+	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
+		int trigger_mode;
+		if (apic_test_vector(vector, apic->regs + APIC_TMR))
+			trigger_mode = IOAPIC_LEVEL_TRIG;
+		else
+			trigger_mode = IOAPIC_EDGE_TRIG;
+		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
+	}
+}
+
 static int apic_set_eoi(struct kvm_lapic *apic)
 {
-	int vector = apic_find_highest_isr(apic);
+	int vector = kvm_apic_find_highest_isr(apic);
 
 	trace_kvm_eoi(apic, vector);
 
@@ -756,19 +768,26 @@ static int apic_set_eoi(struct kvm_lapic *apic)
 	apic_clear_isr(vector, apic);
 	apic_update_ppr(apic);
 
-	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
-	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
-		int trigger_mode;
-		if (apic_test_vector(vector, apic->regs + APIC_TMR))
-			trigger_mode = IOAPIC_LEVEL_TRIG;
-		else
-			trigger_mode = IOAPIC_EDGE_TRIG;
-		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
-	}
+	kvm_ioapic_send_eoi(apic, vector);
 	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
 	return vector;
 }
 
+/*
+ * this interface assumes a trap-like exit, which has already finished
+ * desired side effect including vISR and vPPR update.
+ */
+void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
+{
+	struct kvm_lapic *apic = vcpu->arch.apic;
+
+	trace_kvm_eoi(apic, vector);
+
+	kvm_ioapic_send_eoi(apic, vector);
+	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
+}
+EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
+
 static void apic_send_ipi(struct kvm_lapic *apic)
 {
 	u32 icr_low = kvm_apic_get_reg(apic, APIC_ICR);
@@ -1071,6 +1090,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 		if (!apic_x2apic_mode(apic)) {
 			apic_set_reg(apic, APIC_DFR, val | 0x0FFFFFFF);
 			recalculate_apic_map(apic->vcpu->kvm);
+			ioapic_update_eoi_exitmap(apic->vcpu->kvm);
 		} else
 			ret = 1;
 		break;
@@ -1318,6 +1338,7 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
 		else
 			static_key_slow_inc(&apic_hw_disabled.key);
 		recalculate_apic_map(vcpu->kvm);
+		ioapic_update_eoi_exitmap(apic->vcpu->kvm);
 	}
 
 	if (!kvm_vcpu_is_bsp(apic->vcpu))
@@ -1375,7 +1396,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
 		apic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
 	}
 	apic->irr_pending = false;
-	apic->isr_count = 0;
+	apic->isr_count = kvm_apic_vid_enabled(vcpu);
 	apic->highest_isr_cache = -1;
 	update_divide_count(apic);
 	atomic_set(&apic->lapic_timer.pending, 0);
@@ -1590,8 +1611,10 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
 	update_divide_count(apic);
 	start_apic_timer(apic);
 	apic->irr_pending = true;
-	apic->isr_count = count_vectors(apic->regs + APIC_ISR);
+	apic->isr_count = kvm_apic_vid_enabled(vcpu) ?
+				1 : count_vectors(apic->regs + APIC_ISR);
 	apic->highest_isr_cache = -1;
+	kvm_x86_ops->restore_rvi(vcpu);
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 }
 
@@ -1704,7 +1727,7 @@ void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu)
 	max_irr = apic_find_highest_irr(apic);
 	if (max_irr < 0)
 		max_irr = 0;
-	max_isr = apic_find_highest_isr(apic);
+	max_isr = kvm_apic_find_highest_isr(apic);
 	if (max_isr < 0)
 		max_isr = 0;
 	data = (tpr & 0xff) | ((max_isr & 0xf0) << 8) | (max_irr << 24);
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 9a8ee22..52c66a5 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -39,6 +39,7 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu);
 int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
 int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
+int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu);
 void kvm_lapic_reset(struct kvm_vcpu *vcpu);
 u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
 void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8);
@@ -60,11 +61,13 @@ void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data);
 void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
 		struct kvm_lapic_state *s);
 int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
+int kvm_apic_find_highest_isr(struct kvm_lapic *apic);
 
 u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
 void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
 
 void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
+void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector);
 
 void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr);
 void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
@@ -75,6 +78,7 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
 
 int kvm_hv_vapic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data);
 int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
+bool kvm_apic_present(struct kvm_vcpu *vcpu);
 
 static inline bool kvm_hv_vapic_assist_page_enabled(struct kvm_vcpu *vcpu)
 {
@@ -116,14 +120,31 @@ static inline int kvm_apic_sw_enabled(struct kvm_lapic *apic)
 	return APIC_SPIV_APIC_ENABLED;
 }
 
-static inline bool kvm_apic_present(struct kvm_vcpu *vcpu)
+static inline int kvm_lapic_enabled(struct kvm_vcpu *vcpu)
 {
-	return kvm_vcpu_has_lapic(vcpu) && kvm_apic_hw_enabled(vcpu->arch.apic);
+	return kvm_apic_present(vcpu) && kvm_apic_sw_enabled(vcpu->arch.apic);
 }
 
-static inline int kvm_lapic_enabled(struct kvm_vcpu *vcpu)
+static inline bool kvm_apic_vid_enabled(struct kvm_vcpu *vcpu)
 {
-	return kvm_apic_present(vcpu) && kvm_apic_sw_enabled(vcpu->arch.apic);
+	return kvm_x86_ops->has_virtual_interrupt_delivery(vcpu);
+}
+
+static inline u16 apic_cluster_id(struct kvm_apic_map *map, u32 ldr)
+{
+	u16 cid;
+	ldr >>= 32 - map->ldr_bits;
+	cid = (ldr >> map->cid_shift) & map->cid_mask;
+
+	BUG_ON(cid >= ARRAY_SIZE(map->logical_map));
+
+	return cid;
+}
+
+static inline u16 apic_logical_id(struct kvm_apic_map *map, u32 ldr)
+{
+	ldr >>= (32 - map->ldr_bits);
+	return ldr & map->lid_mask;
 }
 
 #endif
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d29d3cd..a8a8a4e 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3571,6 +3571,36 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
 		set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
 }
 
+static int svm_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+static void svm_update_eoi_exitmap(struct kvm *kvm, struct kvm_lapic_irq *irq)
+{
+	return;
+}
+
+static void svm_update_exitmap_start(struct kvm_vcpu *vcpu)
+{
+	return;
+}
+
+static void svm_update_exitmap_end(struct kvm_vcpu *vcpu)
+{
+	return;
+}
+
+static void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu)
+{
+	return;
+}
+
+static void svm_restore_rvi(struct kvm_vcpu *vcpu)
+{
+	return;
+}
+
 static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -4290,6 +4320,12 @@ static struct kvm_x86_ops svm_x86_ops = {
 	.enable_nmi_window = enable_nmi_window,
 	.enable_irq_window = enable_irq_window,
 	.update_cr8_intercept = update_cr8_intercept,
+	.has_virtual_interrupt_delivery = svm_has_virtual_interrupt_delivery,
+	.update_eoi_exitmap = svm_update_eoi_exitmap,
+	.update_exitmap_start = svm_update_exitmap_start,
+	.update_exitmap_end = svm_update_exitmap_end,
+	.load_eoi_exitmap = svm_load_eoi_exitmap,
+	.restore_rvi = svm_restore_rvi,
 
 	.set_tss_addr = svm_set_tss_addr,
 	.get_tdp_level = get_npt_level,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 730dc13..0c85c7e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -433,6 +433,9 @@ struct vcpu_vmx {
 
 	bool rdtscp_enabled;
 
+	unsigned long eoi_exit_bitmap[4];
+	spinlock_t eoi_bitmap_lock;
+
 	/* Support for a guest hypervisor (nested VMX) */
 	struct nested_vmx nested;
 };
@@ -771,6 +774,12 @@ static inline bool cpu_has_vmx_apic_register_virt(void)
 		SECONDARY_EXEC_APIC_REGISTER_VIRT;
 }
 
+static inline bool cpu_has_vmx_virtual_intr_delivery(void)
+{
+	return vmcs_config.cpu_based_2nd_exec_ctrl &
+		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
+}
+
 static inline bool cpu_has_vmx_flexpriority(void)
 {
 	return cpu_has_vmx_tpr_shadow() &&
@@ -2549,7 +2558,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 			SECONDARY_EXEC_PAUSE_LOOP_EXITING |
 			SECONDARY_EXEC_RDTSCP |
 			SECONDARY_EXEC_ENABLE_INVPCID |
-			SECONDARY_EXEC_APIC_REGISTER_VIRT;
+			SECONDARY_EXEC_APIC_REGISTER_VIRT |
+			SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
 		if (adjust_vmx_controls(min2, opt2,
 					MSR_IA32_VMX_PROCBASED_CTLS2,
 					&_cpu_based_2nd_exec_control) < 0)
@@ -2563,7 +2573,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 
 	if (!(_cpu_based_exec_control & CPU_BASED_TPR_SHADOW))
 		_cpu_based_2nd_exec_control &= ~(
-				SECONDARY_EXEC_APIC_REGISTER_VIRT);
+				SECONDARY_EXEC_APIC_REGISTER_VIRT |
+				SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
 
 	if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) {
 		/* CR3 accesses and invlpg don't need to cause VM Exits when EPT
@@ -2762,9 +2773,15 @@ static __init int hardware_setup(void)
 	if (!cpu_has_vmx_ple())
 		ple_gap = 0;
 
-	if (!cpu_has_vmx_apic_register_virt())
+	if (!cpu_has_vmx_apic_register_virt() ||
+				!cpu_has_vmx_virtual_intr_delivery())
 		enable_apicv_reg_vid = 0;
 
+	if (enable_apicv_reg_vid)
+		kvm_x86_ops->update_cr8_intercept = NULL;
+	else
+		kvm_x86_ops->update_apic_irq = NULL;
+
 	if (nested)
 		nested_vmx_setup_ctls_msrs();
 
@@ -3845,7 +3862,8 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
 	if (!ple_gap)
 		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
 	if (!enable_apicv_reg_vid)
-		exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
+		exec_control &= ~(SECONDARY_EXEC_APIC_REGISTER_VIRT |
+				  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
 	return exec_control;
 }
 
@@ -3890,6 +3908,16 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 				vmx_secondary_exec_control(vmx));
 	}
 
+	if (enable_apicv_reg_vid) {
+		vmcs_write64(EOI_EXIT_BITMAP0, 0);
+		vmcs_write64(EOI_EXIT_BITMAP1, 0);
+		vmcs_write64(EOI_EXIT_BITMAP2, 0);
+		vmcs_write64(EOI_EXIT_BITMAP3, 0);
+		spin_lock_init(&vmx->eoi_bitmap_lock);
+
+		vmcs_write16(GUEST_INTR_STATUS, 0);
+	}
+
 	if (ple_gap) {
 		vmcs_write32(PLE_GAP, ple_gap);
 		vmcs_write32(PLE_WINDOW, ple_window);
@@ -4805,6 +4833,16 @@ static int handle_apic_access(struct kvm_vcpu *vcpu)
 	return emulate_instruction(vcpu, 0) == EMULATE_DONE;
 }
 
+static int handle_apic_eoi_induced(struct kvm_vcpu *vcpu)
+{
+	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
+	int vector = exit_qualification & 0xff;
+
+	/* EOI-induced VM exit is trap-like and thus no need to adjust IP */
+	kvm_apic_set_eoi_accelerated(vcpu, vector);
+	return 1;
+}
+
 static int handle_apic_write(struct kvm_vcpu *vcpu)
 {
 	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
@@ -5750,6 +5788,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[EXIT_REASON_TPR_BELOW_THRESHOLD]     = handle_tpr_below_threshold,
 	[EXIT_REASON_APIC_ACCESS]             = handle_apic_access,
 	[EXIT_REASON_APIC_WRITE]              = handle_apic_write,
+	[EXIT_REASON_EOI_INDUCED]             = handle_apic_eoi_induced,
 	[EXIT_REASON_WBINVD]                  = handle_wbinvd,
 	[EXIT_REASON_XSETBV]                  = handle_xsetbv,
 	[EXIT_REASON_TASK_SWITCH]             = handle_task_switch,
@@ -6099,6 +6138,142 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
 	vmcs_write32(TPR_THRESHOLD, irr);
 }
 
+static int vmx_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
+{
+	return enable_apicv_reg_vid;
+}
+
+static void vmx_restore_rvi(struct kvm_vcpu *vcpu)
+{
+	int isr;
+	u16 status;
+	u8 old;
+
+	if (!enable_apicv_reg_vid)
+		return;
+
+	isr = kvm_apic_find_highest_isr(vcpu->arch.apic);
+	if (isr == -1)
+		return;
+
+	status = vmcs_read16(GUEST_INTR_STATUS);
+	old = status >> 8;
+	if (isr != old) {
+		status &= 0xff;
+		status |= isr << 8;
+		vmcs_write16(GUEST_INTR_STATUS, status);
+	}
+}
+
+static void vmx_update_rvi(int vector)
+{
+	u16 status;
+	u8 old;
+
+	status = vmcs_read16(GUEST_INTR_STATUS);
+	old = (u8)status & 0xff;
+	if ((u8)vector != old) {
+		status &= ~0xff;
+		status |= (u8)vector;
+		vmcs_write16(GUEST_INTR_STATUS, status);
+	}
+}
+
+static void vmx_update_apic_irq(struct kvm_vcpu *vcpu, int max_irr)
+{
+	if (max_irr == -1)
+		return;
+
+	vmx_update_rvi(max_irr);
+}
+
+static void set_eoi_exitmap_one(struct kvm_vcpu *vcpu,
+				u32 vector)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	if (!enable_apicv_reg_vid)
+		return;
+
+	if (WARN_ONCE((vector > 255),
+		"KVM VMX: vector (%d) out of range\n", vector))
+		return;
+
+	set_bit(vector, vmx->eoi_exit_bitmap);
+
+	kvm_make_request(KVM_REQ_EOIBITMAP, vcpu);
+}
+
+void vmx_update_eoi_exitmap(struct kvm *kvm, struct kvm_lapic_irq *irq)
+{
+	struct kvm_vcpu *vcpu;
+	struct kvm_lapic **dst;
+	struct kvm_apic_map *map;
+	unsigned long bitmap = 1;
+	int i;
+
+	rcu_read_lock();
+	map = rcu_dereference(kvm->arch.apic_map);
+
+	if (unlikely(!map)) {
+		kvm_for_each_vcpu(i, vcpu, kvm)
+			set_eoi_exitmap_one(vcpu, irq->vector);
+		goto out;
+	}
+
+	if (irq->dest_mode == 0) { /* physical mode */
+		if (irq->delivery_mode == APIC_DM_LOWEST ||
+				irq->dest_id == 0xff) {
+			kvm_for_each_vcpu(i, vcpu, kvm)
+				set_eoi_exitmap_one(vcpu, irq->vector);
+			goto out;
+		}
+		dst = &map->phys_map[irq->dest_id & 0xff];
+	} else {
+		u32 mda = irq->dest_id << (32 - map->ldr_bits);
+
+		dst = map->logical_map[apic_cluster_id(map, mda)];
+
+		bitmap = apic_logical_id(map, mda);
+	}
+
+	for_each_set_bit(i, &bitmap, 16) {
+		if (!dst[i])
+			continue;
+		set_eoi_exitmap_one(dst[i]->vcpu, irq->vector);
+	}
+
+out:
+	rcu_read_unlock();
+}
+
+static void vmx_update_exitmap_start(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	memset(vmx->eoi_exit_bitmap, 0, 32);
+	spin_lock(&vmx->eoi_bitmap_lock);
+}
+
+static void vmx_update_exitmap_end(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	spin_unlock(&vmx->eoi_bitmap_lock);
+}
+
+static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	spin_lock(&vmx->eoi_bitmap_lock);
+	vmcs_write64(EOI_EXIT_BITMAP0, vmx->eoi_exit_bitmap[0]);
+	vmcs_write64(EOI_EXIT_BITMAP1, vmx->eoi_exit_bitmap[1]);
+	vmcs_write64(EOI_EXIT_BITMAP2, vmx->eoi_exit_bitmap[2]);
+	vmcs_write64(EOI_EXIT_BITMAP3, vmx->eoi_exit_bitmap[3]);
+	spin_unlock(&vmx->eoi_bitmap_lock);
+}
+
 static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
 {
 	u32 exit_intr_info;
@@ -7362,6 +7537,13 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.enable_nmi_window = enable_nmi_window,
 	.enable_irq_window = enable_irq_window,
 	.update_cr8_intercept = update_cr8_intercept,
+	.has_virtual_interrupt_delivery = vmx_has_virtual_interrupt_delivery,
+	.update_apic_irq = vmx_update_apic_irq,
+	.update_eoi_exitmap = vmx_update_eoi_exitmap,
+	.update_exitmap_start = vmx_update_exitmap_start,
+	.update_exitmap_end = vmx_update_exitmap_end,
+	.load_eoi_exitmap = vmx_load_eoi_exitmap,
+	.restore_rvi = vmx_restore_rvi,
 
 	.set_tss_addr = vmx_set_tss_addr,
 	.get_tdp_level = get_ept_level,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1c9c834..ffacfef 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5527,7 +5527,7 @@ static void inject_pending_event(struct kvm_vcpu *vcpu)
 			vcpu->arch.nmi_injected = true;
 			kvm_x86_ops->set_nmi(vcpu);
 		}
-	} else if (kvm_cpu_has_interrupt(vcpu)) {
+	} else if (kvm_cpu_has_injectable_intr(vcpu)) {
 		if (kvm_x86_ops->interrupt_allowed(vcpu)) {
 			kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
 					    false);
@@ -5648,6 +5648,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_handle_pmu_event(vcpu);
 		if (kvm_check_request(KVM_REQ_PMI, vcpu))
 			kvm_deliver_pmi(vcpu);
+		if (kvm_check_request(KVM_REQ_EOIBITMAP, vcpu))
+			kvm_x86_ops->load_eoi_exitmap(vcpu);
 	}
 
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
@@ -5656,10 +5658,15 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		/* enable NMI/IRQ window open exits if needed */
 		if (vcpu->arch.nmi_pending)
 			kvm_x86_ops->enable_nmi_window(vcpu);
-		else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
+		else if (kvm_cpu_has_injectable_intr(vcpu) || req_int_win)
 			kvm_x86_ops->enable_irq_window(vcpu);
 
 		if (kvm_lapic_enabled(vcpu)) {
+			/* update archtecture specific hints for APIC
+			 * virtual interrupt delivery */
+			if (kvm_x86_ops->update_apic_irq)
+				kvm_x86_ops->update_apic_irq(vcpu,
+					      kvm_lapic_find_highest_irr(vcpu));
 			update_cr8_intercept(vcpu);
 			kvm_lapic_sync_to_vapic(vcpu);
 		}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index cbe0d68..2383202 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -122,6 +122,7 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQ_WATCHDOG          18
 #define KVM_REQ_MASTERCLOCK_UPDATE 19
 #define KVM_REQ_MCLOCK_INPROGRESS 20
+#define KVM_REQ_EOIBITMAP         21
 
 #define KVM_USERSPACE_IRQ_SOURCE_ID		0
 #define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID	1
@@ -690,6 +691,7 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
 int kvm_set_irq_inatomic(struct kvm *kvm, int irq_source_id, u32 irq, int level);
 int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm,
 		int irq_source_id, int level);
+bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
 				   struct kvm_irq_ack_notifier *kian);
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index f3abbef..4341ad3 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -115,6 +115,46 @@ static void update_handled_vectors(struct kvm_ioapic *ioapic)
 	smp_wmb();
 }
 
+static void ioapic_update_eoi_exitmap_one(struct kvm_ioapic *ioapic, int pin)
+{
+	union kvm_ioapic_redirect_entry *e;
+
+	e = &ioapic->redirtbl[pin];
+
+	if ((e->fields.trig_mode == IOAPIC_LEVEL_TRIG ||
+		 kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, pin))) {
+		struct kvm_lapic_irq irqe;
+
+		irqe.dest_id = e->fields.dest_id;
+		irqe.vector = e->fields.vector;
+		irqe.dest_mode = e->fields.dest_mode;
+		irqe.delivery_mode = e->fields.delivery_mode << 8;
+		kvm_x86_ops->update_eoi_exitmap(ioapic->kvm, &irqe);
+	}
+}
+
+void ioapic_update_eoi_exitmap(struct kvm *kvm)
+{
+	struct kvm_ioapic *ioapic = kvm->arch.vioapic;
+	union kvm_ioapic_redirect_entry *e;
+	int index;
+	struct kvm_vcpu *vcpu;
+
+	kvm_for_each_vcpu(index, vcpu, kvm)
+		kvm_x86_ops->update_exitmap_start(vcpu);
+
+	for (index = 0; index < IOAPIC_NUM_PINS; index++) {
+		e = &ioapic->redirtbl[index];
+		if (!e->fields.mask)
+			ioapic_update_eoi_exitmap_one(ioapic, index);
+	}
+
+	kvm_for_each_vcpu(index, vcpu, kvm) {
+		kvm_x86_ops->update_exitmap_end(vcpu);
+		kvm_vcpu_kick(vcpu);
+	}
+}
+
 static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
 {
 	unsigned index;
@@ -156,6 +196,7 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
 		if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG
 		    && ioapic->irr & (1 << index))
 			ioapic_service(ioapic, index);
+		ioapic_update_eoi_exitmap(ioapic->kvm);
 		break;
 	}
 }
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index a30abfe..e8d67cf 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -82,5 +82,6 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 		struct kvm_lapic_irq *irq);
 int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
 int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
+void ioapic_update_eoi_exitmap(struct kvm *kvm);
 
 #endif
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 656fa45..5455f5a 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -237,6 +237,24 @@ int kvm_set_irq_inatomic(struct kvm *kvm, int irq_source_id, u32 irq, int level)
 	return ret;
 }
 
+bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin)
+{
+	struct kvm_irq_ack_notifier *kian;
+	struct hlist_node *n;
+	int gsi;
+
+	rcu_read_lock();
+	gsi = rcu_dereference(kvm->irq_routing)->chip[irqchip][pin];
+	if (gsi != -1)
+		hlist_for_each_entry_rcu(kian, n, &kvm->irq_ack_notifier_list,
+					 link)
+			if (kian->gsi == gsi)
+				return true;
+	rcu_read_unlock();
+
+	return false;
+}
+
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
 {
 	struct kvm_irq_ack_notifier *kian;
@@ -261,6 +279,7 @@ void kvm_register_irq_ack_notifier(struct kvm *kvm,
 	mutex_lock(&kvm->irq_lock);
 	hlist_add_head_rcu(&kian->link, &kvm->irq_ack_notifier_list);
 	mutex_unlock(&kvm->irq_lock);
+	ioapic_update_eoi_exitmap(kvm);
 }
 
 void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
@@ -270,6 +289,7 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
 	hlist_del_init_rcu(&kian->link);
 	mutex_unlock(&kvm->irq_lock);
 	synchronize_rcu();
+	ioapic_update_eoi_exitmap(kvm);
 }
 
 int kvm_request_irq_source_id(struct kvm *kvm)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
  2013-01-07  2:02 ` [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support Yang Zhang
@ 2013-01-07  7:51   ` Gleb Natapov
  2013-01-07 13:04     ` Zhang, Yang Z
  2013-01-07 13:52   ` Marcelo Tosatti
  1 sibling, 1 reply; 22+ messages in thread
From: Gleb Natapov @ 2013-01-07  7:51 UTC (permalink / raw)
  To: Yang Zhang; +Cc: kvm, haitao.shan, mtosatti, Kevin Tian

On Mon, Jan 07, 2013 at 10:02:36AM +0800, Yang Zhang wrote:
> From: Yang Zhang <yang.z.zhang@Intel.com>
> 
> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> manually, which is fully taken care of by the hardware. This needs
> some special awareness into existing interrupr injection path:
> 
> - for pending interrupt, instead of direct injection, we may need
>   update architecture specific indicators before resuming to guest.
> 
> - A pending interrupt, which is masked by ISR, should be also
>   considered in above update action, since hardware will decide
>   when to inject it at right time. Current has_interrupt and
>   get_interrupt only returns a valid vector from injection p.o.v.
> 
> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
> ---
>  arch/ia64/kvm/lapic.h           |    6 ++
>  arch/x86/include/asm/kvm_host.h |    8 ++
>  arch/x86/include/asm/vmx.h      |   11 +++
>  arch/x86/kvm/irq.c              |   56 +++++++++++-
>  arch/x86/kvm/lapic.c            |   87 +++++++++++-------
>  arch/x86/kvm/lapic.h            |   29 +++++-
>  arch/x86/kvm/svm.c              |   36 ++++++++
>  arch/x86/kvm/vmx.c              |  190 ++++++++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/x86.c              |   11 ++-
>  include/linux/kvm_host.h        |    2 +
>  virt/kvm/ioapic.c               |   41 +++++++++
>  virt/kvm/ioapic.h               |    1 +
>  virt/kvm/irq_comm.c             |   20 ++++
>  13 files changed, 451 insertions(+), 47 deletions(-)
> 
> diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h
> index c5f92a9..cb59eb4 100644
> --- a/arch/ia64/kvm/lapic.h
> +++ b/arch/ia64/kvm/lapic.h
> @@ -27,4 +27,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);
>  #define kvm_apic_present(x) (true)
>  #define kvm_lapic_enabled(x) (true)
>  
> +static inline void kvm_update_eoi_exitmap(struct kvm *kvm,
> +					struct kvm_lapic_irq *irq)
> +{
> +	/* IA64 has no apicv supporting, do nothing here */
> +}
> +
>  #endif
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c431b33..135603f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -697,6 +697,13 @@ struct kvm_x86_ops {
>  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
>  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
>  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
> +	int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
> +	void (*update_apic_irq)(struct kvm_vcpu *vcpu, int max_irr);
> +	void (*update_eoi_exitmap)(struct kvm *kvm, struct kvm_lapic_irq *irq);
> +	void (*update_exitmap_start)(struct kvm_vcpu *vcpu);
> +	void (*update_exitmap_end)(struct kvm_vcpu *vcpu);
> +	void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu);
The amount of callbacks to update exit bitmap start to become insane.

> +	void (*restore_rvi)(struct kvm_vcpu *vcpu);
rvi? Call it set_svi() and make it do just that - set svi.

>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>  	int (*get_tdp_level)(void);
>  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
> @@ -991,6 +998,7 @@ int kvm_age_hva(struct kvm *kvm, unsigned long hva);
>  int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
>  void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
>  int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
> +int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
>  int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
>  int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
>  int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index 44c3f7e..d1ab331 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -62,6 +62,7 @@
>  #define EXIT_REASON_MCE_DURING_VMENTRY  41
>  #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
>  #define EXIT_REASON_APIC_ACCESS         44
> +#define EXIT_REASON_EOI_INDUCED         45
>  #define EXIT_REASON_EPT_VIOLATION       48
>  #define EXIT_REASON_EPT_MISCONFIG       49
>  #define EXIT_REASON_WBINVD              54
> @@ -143,6 +144,7 @@
>  #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040
>  #define SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
>  #define SECONDARY_EXEC_APIC_REGISTER_VIRT       0x00000100
> +#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200
>  #define SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400
>  #define SECONDARY_EXEC_ENABLE_INVPCID		0x00001000
>  
> @@ -180,6 +182,7 @@ enum vmcs_field {
>  	GUEST_GS_SELECTOR               = 0x0000080a,
>  	GUEST_LDTR_SELECTOR             = 0x0000080c,
>  	GUEST_TR_SELECTOR               = 0x0000080e,
> +	GUEST_INTR_STATUS               = 0x00000810,
>  	HOST_ES_SELECTOR                = 0x00000c00,
>  	HOST_CS_SELECTOR                = 0x00000c02,
>  	HOST_SS_SELECTOR                = 0x00000c04,
> @@ -207,6 +210,14 @@ enum vmcs_field {
>  	APIC_ACCESS_ADDR_HIGH		= 0x00002015,
>  	EPT_POINTER                     = 0x0000201a,
>  	EPT_POINTER_HIGH                = 0x0000201b,
> +	EOI_EXIT_BITMAP0                = 0x0000201c,
> +	EOI_EXIT_BITMAP0_HIGH           = 0x0000201d,
> +	EOI_EXIT_BITMAP1                = 0x0000201e,
> +	EOI_EXIT_BITMAP1_HIGH           = 0x0000201f,
> +	EOI_EXIT_BITMAP2                = 0x00002020,
> +	EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
> +	EOI_EXIT_BITMAP3                = 0x00002022,
> +	EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
>  	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
>  	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
>  	VMCS_LINK_POINTER               = 0x00002800,
> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
> index b111aee..e113440 100644
> --- a/arch/x86/kvm/irq.c
> +++ b/arch/x86/kvm/irq.c
> @@ -38,6 +38,38 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
>  EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
>  
>  /*
> + * check if there is pending interrupt from
> + * non-APIC source without intack.
> + */
> +static int kvm_cpu_has_extint(struct kvm_vcpu *v)
> +{
> +	if (kvm_apic_accept_pic_intr(v))
> +		return pic_irqchip(v->kvm)->output;	/* PIC */
> +	else
> +		return 0;
> +}
> +
> +/*
> + * check if there is injectable interrupt:
> + * when virtual interrupt delivery enabled,
> + * interrupt from apic will handled by hardware,
> + * we don't need to check it here.
> + */
> +int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v)
> +{
> +	if (!irqchip_in_kernel(v->kvm))
> +		return v->arch.interrupt.pending;
> +
> +	if (kvm_cpu_has_extint(v))
> +		return 1;
> +
> +	if (kvm_apic_vid_enabled(v))
> +		return 0;
> +
> +	return kvm_apic_has_interrupt(v) != -1; /* LAPIC */
> +}
> +
> +/*
>   * check if there is pending interrupt without
>   * intack.
>   */
> @@ -46,27 +78,41 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
>  	if (!irqchip_in_kernel(v->kvm))
>  		return v->arch.interrupt.pending;
>  
> -	if (kvm_apic_accept_pic_intr(v) && pic_irqchip(v->kvm)->output)
> -		return pic_irqchip(v->kvm)->output;	/* PIC */
> +	if (kvm_cpu_has_extint(v))
> +		return 1;
>  
>  	return kvm_apic_has_interrupt(v) != -1;	/* LAPIC */
>  }
>  EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
>  
>  /*
> + * Read pending interrupt(from non-APIC source)
> + * vector and intack.
> + */
> +static int kvm_cpu_get_extint(struct kvm_vcpu *v)
> +{
> +	if (kvm_cpu_has_extint(v))
> +		return kvm_pic_read_irq(v->kvm); /* PIC */
> +	return -1;
> +}
> +
> +/*
>   * Read pending interrupt vector and intack.
>   */
>  int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
>  {
> +	int vector;
> +
>  	if (!irqchip_in_kernel(v->kvm))
>  		return v->arch.interrupt.nr;
>  
> -	if (kvm_apic_accept_pic_intr(v) && pic_irqchip(v->kvm)->output)
> -		return kvm_pic_read_irq(v->kvm);	/* PIC */
> +	vector = kvm_cpu_get_extint(v);
> +
> +	if (kvm_apic_vid_enabled(v) || vector != -1)
> +		return vector;			/* PIC */
>  
>  	return kvm_get_apic_interrupt(v);	/* APIC */
>  }
> -EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
>  
>  void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
>  {
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 0664c13..e1baf37 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -133,6 +133,12 @@ static inline int apic_enabled(struct kvm_lapic *apic)
>  	return kvm_apic_sw_enabled(apic) &&	kvm_apic_hw_enabled(apic);
>  }
>  
> +bool kvm_apic_present(struct kvm_vcpu *vcpu)
> +{
> +	return kvm_vcpu_has_lapic(vcpu) && kvm_apic_hw_enabled(vcpu->arch.apic);
> +}
> +EXPORT_SYMBOL_GPL(kvm_apic_present);
> +
Why is this change? Drop it.

>  #define LVT_MASK	\
>  	(APIC_LVT_MASKED | APIC_SEND_PENDING | APIC_VECTOR_MASK)
>  
> @@ -150,23 +156,6 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
>  	return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
>  }
>  
> -static inline u16 apic_cluster_id(struct kvm_apic_map *map, u32 ldr)
> -{
> -	u16 cid;
> -	ldr >>= 32 - map->ldr_bits;
> -	cid = (ldr >> map->cid_shift) & map->cid_mask;
> -
> -	BUG_ON(cid >= ARRAY_SIZE(map->logical_map));
> -
> -	return cid;
> -}
> -
> -static inline u16 apic_logical_id(struct kvm_apic_map *map, u32 ldr)
> -{
> -	ldr >>= (32 - map->ldr_bits);
> -	return ldr & map->lid_mask;
> -}
> -
>  static void recalculate_apic_map(struct kvm *kvm)
>  {
>  	struct kvm_apic_map *new, *old = NULL;
> @@ -236,12 +225,14 @@ static inline void kvm_apic_set_id(struct kvm_lapic *apic, u8 id)
>  {
>  	apic_set_reg(apic, APIC_ID, id << 24);
>  	recalculate_apic_map(apic->vcpu->kvm);
> +	ioapic_update_eoi_exitmap(apic->vcpu->kvm);
>  }
>  
>  static inline void kvm_apic_set_ldr(struct kvm_lapic *apic, u32 id)
>  {
>  	apic_set_reg(apic, APIC_LDR, id);
>  	recalculate_apic_map(apic->vcpu->kvm);
> +	ioapic_update_eoi_exitmap(apic->vcpu->kvm);
>  }
>  
>  static inline int apic_lvt_enabled(struct kvm_lapic *apic, int lvt_type)
> @@ -345,6 +336,9 @@ static inline int apic_find_highest_irr(struct kvm_lapic *apic)
>  {
>  	int result;
>  
> +	/* Note that irr_pending is just a hint. In the platform
> +	 * that has virtual interrupt delivery supporting, virr
> +	 * is cleared by hardware without updating irr_pending. */
Just say that with vid enables irr_pending will be always true.

>  	if (!apic->irr_pending)
>  		return -1;
>  
> @@ -458,9 +452,13 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
>  	__clear_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
>  }
>  
> -static inline int apic_find_highest_isr(struct kvm_lapic *apic)
> +int kvm_apic_find_highest_isr(struct kvm_lapic *apic)
>  {
>  	int result;
> +
> +	/* Note that isr_count is just a hint. In the platform
> +	 * that has virtual interrupt delivery supporting, visr
> +	 * will be set by hardware without updating irr_pending. */
isr_count is not a hint! Without vid it has to be precise. Again just
say that with vid enabled isr_count is always 1.

>  	if (!apic->isr_count)
>  		return -1;
>  	if (likely(apic->highest_isr_cache != -1))
> @@ -471,6 +469,7 @@ static inline int apic_find_highest_isr(struct kvm_lapic *apic)
>  
>  	return result;
>  }
> +EXPORT_SYMBOL_GPL(kvm_apic_find_highest_isr);
>  
>  static void apic_update_ppr(struct kvm_lapic *apic)
>  {
> @@ -479,7 +478,7 @@ static void apic_update_ppr(struct kvm_lapic *apic)
>  
>  	old_ppr = kvm_apic_get_reg(apic, APIC_PROCPRI);
>  	tpr = kvm_apic_get_reg(apic, APIC_TASKPRI);
> -	isr = apic_find_highest_isr(apic);
> +	isr = kvm_apic_find_highest_isr(apic);
>  	isrv = (isr != -1) ? isr : 0;
>  
>  	if ((tpr & 0xf0) >= (isrv & 0xf0))
> @@ -740,9 +739,22 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
>  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
>  }
>  
> +static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
> +{
> +	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> +	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
> +		int trigger_mode;
> +		if (apic_test_vector(vector, apic->regs + APIC_TMR))
> +			trigger_mode = IOAPIC_LEVEL_TRIG;
> +		else
> +			trigger_mode = IOAPIC_EDGE_TRIG;
> +		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> +	}
> +}
> +
>  static int apic_set_eoi(struct kvm_lapic *apic)
>  {
> -	int vector = apic_find_highest_isr(apic);
> +	int vector = kvm_apic_find_highest_isr(apic);
>  
>  	trace_kvm_eoi(apic, vector);
>  
> @@ -756,19 +768,26 @@ static int apic_set_eoi(struct kvm_lapic *apic)
>  	apic_clear_isr(vector, apic);
>  	apic_update_ppr(apic);
>  
> -	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> -	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
> -		int trigger_mode;
> -		if (apic_test_vector(vector, apic->regs + APIC_TMR))
> -			trigger_mode = IOAPIC_LEVEL_TRIG;
> -		else
> -			trigger_mode = IOAPIC_EDGE_TRIG;
> -		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> -	}
> +	kvm_ioapic_send_eoi(apic, vector);
>  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>  	return vector;
>  }
>  
> +/*
> + * this interface assumes a trap-like exit, which has already finished
> + * desired side effect including vISR and vPPR update.
> + */
> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
> +{
> +	struct kvm_lapic *apic = vcpu->arch.apic;
> +
> +	trace_kvm_eoi(apic, vector);
> +
> +	kvm_ioapic_send_eoi(apic, vector);
> +	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> +}
> +EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
> +
>  static void apic_send_ipi(struct kvm_lapic *apic)
>  {
>  	u32 icr_low = kvm_apic_get_reg(apic, APIC_ICR);
> @@ -1071,6 +1090,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
>  		if (!apic_x2apic_mode(apic)) {
>  			apic_set_reg(apic, APIC_DFR, val | 0x0FFFFFFF);
>  			recalculate_apic_map(apic->vcpu->kvm);
> +			ioapic_update_eoi_exitmap(apic->vcpu->kvm);
>  		} else
>  			ret = 1;
>  		break;
> @@ -1318,6 +1338,7 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
>  		else
>  			static_key_slow_inc(&apic_hw_disabled.key);
>  		recalculate_apic_map(vcpu->kvm);
> +		ioapic_update_eoi_exitmap(apic->vcpu->kvm);
>  	}
>  
>  	if (!kvm_vcpu_is_bsp(apic->vcpu))
> @@ -1375,7 +1396,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
>  		apic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
>  	}
>  	apic->irr_pending = false;
irr_pending should be set to true if vid is enabled.

> -	apic->isr_count = 0;
> +	apic->isr_count = kvm_apic_vid_enabled(vcpu);
>  	apic->highest_isr_cache = -1;
>  	update_divide_count(apic);
>  	atomic_set(&apic->lapic_timer.pending, 0);
> @@ -1590,8 +1611,10 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
>  	update_divide_count(apic);
>  	start_apic_timer(apic);
>  	apic->irr_pending = true;
> -	apic->isr_count = count_vectors(apic->regs + APIC_ISR);
> +	apic->isr_count = kvm_apic_vid_enabled(vcpu) ?
> +				1 : count_vectors(apic->regs + APIC_ISR);
>  	apic->highest_isr_cache = -1;
> +	kvm_x86_ops->restore_rvi(vcpu);
>  	kvm_make_request(KVM_REQ_EVENT, vcpu);
>  }
>  
> @@ -1704,7 +1727,7 @@ void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu)
>  	max_irr = apic_find_highest_irr(apic);
>  	if (max_irr < 0)
>  		max_irr = 0;
> -	max_isr = apic_find_highest_isr(apic);
> +	max_isr = kvm_apic_find_highest_isr(apic);
>  	if (max_isr < 0)
>  		max_isr = 0;
>  	data = (tpr & 0xff) | ((max_isr & 0xf0) << 8) | (max_irr << 24);
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 9a8ee22..52c66a5 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -39,6 +39,7 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu);
>  int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
>  int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu);
>  void kvm_lapic_reset(struct kvm_vcpu *vcpu);
>  u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
>  void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8);
> @@ -60,11 +61,13 @@ void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data);
>  void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
>  		struct kvm_lapic_state *s);
>  int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
> +int kvm_apic_find_highest_isr(struct kvm_lapic *apic);
>  
>  u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
>  void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
>  
>  void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector);
>  
>  void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr);
>  void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
> @@ -75,6 +78,7 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
>  
>  int kvm_hv_vapic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data);
>  int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
> +bool kvm_apic_present(struct kvm_vcpu *vcpu);
>  
>  static inline bool kvm_hv_vapic_assist_page_enabled(struct kvm_vcpu *vcpu)
>  {
> @@ -116,14 +120,31 @@ static inline int kvm_apic_sw_enabled(struct kvm_lapic *apic)
>  	return APIC_SPIV_APIC_ENABLED;
>  }
>  
> -static inline bool kvm_apic_present(struct kvm_vcpu *vcpu)
> +static inline int kvm_lapic_enabled(struct kvm_vcpu *vcpu)
>  {
> -	return kvm_vcpu_has_lapic(vcpu) && kvm_apic_hw_enabled(vcpu->arch.apic);
> +	return kvm_apic_present(vcpu) && kvm_apic_sw_enabled(vcpu->arch.apic);
>  }
>  
> -static inline int kvm_lapic_enabled(struct kvm_vcpu *vcpu)
> +static inline bool kvm_apic_vid_enabled(struct kvm_vcpu *vcpu)
>  {
> -	return kvm_apic_present(vcpu) && kvm_apic_sw_enabled(vcpu->arch.apic);
> +	return kvm_x86_ops->has_virtual_interrupt_delivery(vcpu);
> +}
> +
> +static inline u16 apic_cluster_id(struct kvm_apic_map *map, u32 ldr)
> +{
> +	u16 cid;
> +	ldr >>= 32 - map->ldr_bits;
> +	cid = (ldr >> map->cid_shift) & map->cid_mask;
> +
> +	BUG_ON(cid >= ARRAY_SIZE(map->logical_map));
> +
> +	return cid;
> +}
> +
> +static inline u16 apic_logical_id(struct kvm_apic_map *map, u32 ldr)
> +{
> +	ldr >>= (32 - map->ldr_bits);
> +	return ldr & map->lid_mask;
>  }
>  
>  #endif
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index d29d3cd..a8a8a4e 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -3571,6 +3571,36 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
>  		set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
>  }
>  
> +static int svm_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
> +{
> +	return 0;
> +}
> +
> +static void svm_update_eoi_exitmap(struct kvm *kvm, struct kvm_lapic_irq *irq)
> +{
> +	return;
> +}
> +
> +static void svm_update_exitmap_start(struct kvm_vcpu *vcpu)
> +{
> +	return;
> +}
> +
> +static void svm_update_exitmap_end(struct kvm_vcpu *vcpu)
> +{
> +	return;
> +}
> +
> +static void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu)
> +{
> +	return;
> +}
> +
> +static void svm_restore_rvi(struct kvm_vcpu *vcpu)
> +{
> +	return;
> +}
> +
>  static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -4290,6 +4320,12 @@ static struct kvm_x86_ops svm_x86_ops = {
>  	.enable_nmi_window = enable_nmi_window,
>  	.enable_irq_window = enable_irq_window,
>  	.update_cr8_intercept = update_cr8_intercept,
> +	.has_virtual_interrupt_delivery = svm_has_virtual_interrupt_delivery,
> +	.update_eoi_exitmap = svm_update_eoi_exitmap,
> +	.update_exitmap_start = svm_update_exitmap_start,
> +	.update_exitmap_end = svm_update_exitmap_end,
> +	.load_eoi_exitmap = svm_load_eoi_exitmap,
> +	.restore_rvi = svm_restore_rvi,
>  
>  	.set_tss_addr = svm_set_tss_addr,
>  	.get_tdp_level = get_npt_level,
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 730dc13..0c85c7e 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -433,6 +433,9 @@ struct vcpu_vmx {
>  
>  	bool rdtscp_enabled;
>  
> +	unsigned long eoi_exit_bitmap[4];
> +	spinlock_t eoi_bitmap_lock;
Why not mutex?

> +
>  	/* Support for a guest hypervisor (nested VMX) */
>  	struct nested_vmx nested;
>  };
> @@ -771,6 +774,12 @@ static inline bool cpu_has_vmx_apic_register_virt(void)
>  		SECONDARY_EXEC_APIC_REGISTER_VIRT;
>  }
>  
> +static inline bool cpu_has_vmx_virtual_intr_delivery(void)
> +{
> +	return vmcs_config.cpu_based_2nd_exec_ctrl &
> +		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
> +}
> +
>  static inline bool cpu_has_vmx_flexpriority(void)
>  {
>  	return cpu_has_vmx_tpr_shadow() &&
> @@ -2549,7 +2558,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>  			SECONDARY_EXEC_PAUSE_LOOP_EXITING |
>  			SECONDARY_EXEC_RDTSCP |
>  			SECONDARY_EXEC_ENABLE_INVPCID |
> -			SECONDARY_EXEC_APIC_REGISTER_VIRT;
> +			SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +			SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>  		if (adjust_vmx_controls(min2, opt2,
>  					MSR_IA32_VMX_PROCBASED_CTLS2,
>  					&_cpu_based_2nd_exec_control) < 0)
> @@ -2563,7 +2573,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>  
>  	if (!(_cpu_based_exec_control & CPU_BASED_TPR_SHADOW))
>  		_cpu_based_2nd_exec_control &= ~(
> -				SECONDARY_EXEC_APIC_REGISTER_VIRT);
> +				SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +				SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
>  
>  	if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) {
>  		/* CR3 accesses and invlpg don't need to cause VM Exits when EPT
> @@ -2762,9 +2773,15 @@ static __init int hardware_setup(void)
>  	if (!cpu_has_vmx_ple())
>  		ple_gap = 0;
>  
> -	if (!cpu_has_vmx_apic_register_virt())
> +	if (!cpu_has_vmx_apic_register_virt() ||
> +				!cpu_has_vmx_virtual_intr_delivery())
>  		enable_apicv_reg_vid = 0;
>  
> +	if (enable_apicv_reg_vid)
> +		kvm_x86_ops->update_cr8_intercept = NULL;
> +	else
> +		kvm_x86_ops->update_apic_irq = NULL;
> +
>  	if (nested)
>  		nested_vmx_setup_ctls_msrs();
>  
> @@ -3845,7 +3862,8 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
>  	if (!ple_gap)
>  		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
>  	if (!enable_apicv_reg_vid)
> -		exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
> +		exec_control &= ~(SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +				  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
>  	return exec_control;
>  }
>  
> @@ -3890,6 +3908,16 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
>  				vmx_secondary_exec_control(vmx));
>  	}
>  
> +	if (enable_apicv_reg_vid) {
> +		vmcs_write64(EOI_EXIT_BITMAP0, 0);
> +		vmcs_write64(EOI_EXIT_BITMAP1, 0);
> +		vmcs_write64(EOI_EXIT_BITMAP2, 0);
> +		vmcs_write64(EOI_EXIT_BITMAP3, 0);
> +		spin_lock_init(&vmx->eoi_bitmap_lock);
> +
> +		vmcs_write16(GUEST_INTR_STATUS, 0);
> +	}
> +
>  	if (ple_gap) {
>  		vmcs_write32(PLE_GAP, ple_gap);
>  		vmcs_write32(PLE_WINDOW, ple_window);
> @@ -4805,6 +4833,16 @@ static int handle_apic_access(struct kvm_vcpu *vcpu)
>  	return emulate_instruction(vcpu, 0) == EMULATE_DONE;
>  }
>  
> +static int handle_apic_eoi_induced(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> +	int vector = exit_qualification & 0xff;
> +
> +	/* EOI-induced VM exit is trap-like and thus no need to adjust IP */
> +	kvm_apic_set_eoi_accelerated(vcpu, vector);
> +	return 1;
> +}
> +
>  static int handle_apic_write(struct kvm_vcpu *vcpu)
>  {
>  	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> @@ -5750,6 +5788,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>  	[EXIT_REASON_TPR_BELOW_THRESHOLD]     = handle_tpr_below_threshold,
>  	[EXIT_REASON_APIC_ACCESS]             = handle_apic_access,
>  	[EXIT_REASON_APIC_WRITE]              = handle_apic_write,
> +	[EXIT_REASON_EOI_INDUCED]             = handle_apic_eoi_induced,
>  	[EXIT_REASON_WBINVD]                  = handle_wbinvd,
>  	[EXIT_REASON_XSETBV]                  = handle_xsetbv,
>  	[EXIT_REASON_TASK_SWITCH]             = handle_task_switch,
> @@ -6099,6 +6138,142 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
>  	vmcs_write32(TPR_THRESHOLD, irr);
>  }
>  
> +static int vmx_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
> +{
> +	return enable_apicv_reg_vid;
> +}
> +
> +static void vmx_restore_rvi(struct kvm_vcpu *vcpu)
This has nothing to do with rvi. Call it set_svi() pass it highest isr
as a parameter. Drop apic_find_highest_isr() renaming.

> +{
> +	int isr;
> +	u16 status;
> +	u8 old;
> +
> +	if (!enable_apicv_reg_vid)
> +		return;
> +
> +	isr = kvm_apic_find_highest_isr(vcpu->arch.apic);
> +	if (isr == -1)
> +		return;
If there is not isr set svi should be set to zero.

> +
> +	status = vmcs_read16(GUEST_INTR_STATUS);
> +	old = status >> 8;
> +	if (isr != old) {
> +		status &= 0xff;
> +		status |= isr << 8;
> +		vmcs_write16(GUEST_INTR_STATUS, status);
> +	}
> +}
> +
> +static void vmx_update_rvi(int vector)
set_rvi() for consistency.

> +{
> +	u16 status;
> +	u8 old;
> +
> +	status = vmcs_read16(GUEST_INTR_STATUS);
> +	old = (u8)status & 0xff;
> +	if ((u8)vector != old) {
> +		status &= ~0xff;
> +		status |= (u8)vector;
> +		vmcs_write16(GUEST_INTR_STATUS, status);
> +	}
> +}
> +
> +static void vmx_update_apic_irq(struct kvm_vcpu *vcpu, int max_irr)
> +{
> +	if (max_irr == -1)
> +		return;
> +
> +	vmx_update_rvi(max_irr);
> +}
> +
> +static void set_eoi_exitmap_one(struct kvm_vcpu *vcpu,
> +				u32 vector)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	if (!enable_apicv_reg_vid)
> +		return;
> +
> +	if (WARN_ONCE((vector > 255),
> +		"KVM VMX: vector (%d) out of range\n", vector))
> +		return;
> +
> +	set_bit(vector, vmx->eoi_exit_bitmap);
Since eoi_exit_bitmap is accessed under lock __set_bit() can be used
here.

> +
> +	kvm_make_request(KVM_REQ_EOIBITMAP, vcpu);
> +}
> +
> +void vmx_update_eoi_exitmap(struct kvm *kvm, struct kvm_lapic_irq *irq)
> +{
> +	struct kvm_vcpu *vcpu;
> +	struct kvm_lapic **dst;
> +	struct kvm_apic_map *map;
> +	unsigned long bitmap = 1;
> +	int i;
> +
> +	rcu_read_lock();
> +	map = rcu_dereference(kvm->arch.apic_map);
> +
> +	if (unlikely(!map)) {
> +		kvm_for_each_vcpu(i, vcpu, kvm)
> +			set_eoi_exitmap_one(vcpu, irq->vector);
> +		goto out;
> +	}
> +
> +	if (irq->dest_mode == 0) { /* physical mode */
> +		if (irq->delivery_mode == APIC_DM_LOWEST ||
> +				irq->dest_id == 0xff) {
> +			kvm_for_each_vcpu(i, vcpu, kvm)
> +				set_eoi_exitmap_one(vcpu, irq->vector);
> +			goto out;
> +		}
> +		dst = &map->phys_map[irq->dest_id & 0xff];
> +	} else {
> +		u32 mda = irq->dest_id << (32 - map->ldr_bits);
> +
> +		dst = map->logical_map[apic_cluster_id(map, mda)];
> +
> +		bitmap = apic_logical_id(map, mda);
> +	}
> +
> +	for_each_set_bit(i, &bitmap, 16) {
> +		if (!dst[i])
> +			continue;
> +		set_eoi_exitmap_one(dst[i]->vcpu, irq->vector);
> +	}
> +
> +out:
> +	rcu_read_unlock();
> +}
> +
> +static void vmx_update_exitmap_start(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	memset(vmx->eoi_exit_bitmap, 0, 32);
> +	spin_lock(&vmx->eoi_bitmap_lock);
So you introduced the lock and access the thing that the lock should protect
outside of it?

> +}
> +
> +static void vmx_update_exitmap_end(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	spin_unlock(&vmx->eoi_bitmap_lock);
> +}
> +
> +static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	spin_lock(&vmx->eoi_bitmap_lock);
> +	vmcs_write64(EOI_EXIT_BITMAP0, vmx->eoi_exit_bitmap[0]);
> +	vmcs_write64(EOI_EXIT_BITMAP1, vmx->eoi_exit_bitmap[1]);
> +	vmcs_write64(EOI_EXIT_BITMAP2, vmx->eoi_exit_bitmap[2]);
> +	vmcs_write64(EOI_EXIT_BITMAP3, vmx->eoi_exit_bitmap[3]);
> +	spin_unlock(&vmx->eoi_bitmap_lock);
> +}
> +
>  static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
>  {
>  	u32 exit_intr_info;
> @@ -7362,6 +7537,13 @@ static struct kvm_x86_ops vmx_x86_ops = {
>  	.enable_nmi_window = enable_nmi_window,
>  	.enable_irq_window = enable_irq_window,
>  	.update_cr8_intercept = update_cr8_intercept,
> +	.has_virtual_interrupt_delivery = vmx_has_virtual_interrupt_delivery,
> +	.update_apic_irq = vmx_update_apic_irq,
> +	.update_eoi_exitmap = vmx_update_eoi_exitmap,
> +	.update_exitmap_start = vmx_update_exitmap_start,
> +	.update_exitmap_end = vmx_update_exitmap_end,
> +	.load_eoi_exitmap = vmx_load_eoi_exitmap,
> +	.restore_rvi = vmx_restore_rvi,
>  
>  	.set_tss_addr = vmx_set_tss_addr,
>  	.get_tdp_level = get_ept_level,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1c9c834..ffacfef 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5527,7 +5527,7 @@ static void inject_pending_event(struct kvm_vcpu *vcpu)
>  			vcpu->arch.nmi_injected = true;
>  			kvm_x86_ops->set_nmi(vcpu);
>  		}
> -	} else if (kvm_cpu_has_interrupt(vcpu)) {
> +	} else if (kvm_cpu_has_injectable_intr(vcpu)) {
>  		if (kvm_x86_ops->interrupt_allowed(vcpu)) {
>  			kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
>  					    false);
> @@ -5648,6 +5648,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  			kvm_handle_pmu_event(vcpu);
>  		if (kvm_check_request(KVM_REQ_PMI, vcpu))
>  			kvm_deliver_pmi(vcpu);
> +		if (kvm_check_request(KVM_REQ_EOIBITMAP, vcpu))
> +			kvm_x86_ops->load_eoi_exitmap(vcpu);
>  	}
>  
>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> @@ -5656,10 +5658,15 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  		/* enable NMI/IRQ window open exits if needed */
>  		if (vcpu->arch.nmi_pending)
>  			kvm_x86_ops->enable_nmi_window(vcpu);
> -		else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> +		else if (kvm_cpu_has_injectable_intr(vcpu) || req_int_win)
>  			kvm_x86_ops->enable_irq_window(vcpu);
>  
>  		if (kvm_lapic_enabled(vcpu)) {
> +			/* update archtecture specific hints for APIC
> +			 * virtual interrupt delivery */
> +			if (kvm_x86_ops->update_apic_irq)
> +				kvm_x86_ops->update_apic_irq(vcpu,
> +					      kvm_lapic_find_highest_irr(vcpu));
>  			update_cr8_intercept(vcpu);
>  			kvm_lapic_sync_to_vapic(vcpu);
>  		}
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index cbe0d68..2383202 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -122,6 +122,7 @@ static inline bool is_error_page(struct page *page)
>  #define KVM_REQ_WATCHDOG          18
>  #define KVM_REQ_MASTERCLOCK_UPDATE 19
>  #define KVM_REQ_MCLOCK_INPROGRESS 20
> +#define KVM_REQ_EOIBITMAP         21
>  
>  #define KVM_USERSPACE_IRQ_SOURCE_ID		0
>  #define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID	1
> @@ -690,6 +691,7 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
>  int kvm_set_irq_inatomic(struct kvm *kvm, int irq_source_id, u32 irq, int level);
>  int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm,
>  		int irq_source_id, int level);
> +bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin);
>  void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
>  void kvm_register_irq_ack_notifier(struct kvm *kvm,
>  				   struct kvm_irq_ack_notifier *kian);
> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
> index f3abbef..4341ad3 100644
> --- a/virt/kvm/ioapic.c
> +++ b/virt/kvm/ioapic.c
> @@ -115,6 +115,46 @@ static void update_handled_vectors(struct kvm_ioapic *ioapic)
>  	smp_wmb();
>  }
>  
> +static void ioapic_update_eoi_exitmap_one(struct kvm_ioapic *ioapic, int pin)
> +{
> +	union kvm_ioapic_redirect_entry *e;
> +
> +	e = &ioapic->redirtbl[pin];
> +
> +	if ((e->fields.trig_mode == IOAPIC_LEVEL_TRIG ||
> +		 kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, pin))) {
> +		struct kvm_lapic_irq irqe;
> +
> +		irqe.dest_id = e->fields.dest_id;
> +		irqe.vector = e->fields.vector;
> +		irqe.dest_mode = e->fields.dest_mode;
> +		irqe.delivery_mode = e->fields.delivery_mode << 8;
> +		kvm_x86_ops->update_eoi_exitmap(ioapic->kvm, &irqe);
> +	}
> +}
> +
> +void ioapic_update_eoi_exitmap(struct kvm *kvm)
> +{
> +	struct kvm_ioapic *ioapic = kvm->arch.vioapic;
> +	union kvm_ioapic_redirect_entry *e;
> +	int index;
> +	struct kvm_vcpu *vcpu;
> +
The function should return right here if vid is not enabled.

> +	kvm_for_each_vcpu(index, vcpu, kvm)
> +		kvm_x86_ops->update_exitmap_start(vcpu);
> +
> +	for (index = 0; index < IOAPIC_NUM_PINS; index++) {
> +		e = &ioapic->redirtbl[index];
> +		if (!e->fields.mask)
> +			ioapic_update_eoi_exitmap_one(ioapic, index);
> +	}
> +
> +	kvm_for_each_vcpu(index, vcpu, kvm) {
> +		kvm_x86_ops->update_exitmap_end(vcpu);
> +		kvm_vcpu_kick(vcpu);
> +	}
Since you take all of the locks at the start of the bitmap update and release
them at the end why not have global lock?

> +}
> +
>  static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
>  {
>  	unsigned index;
> @@ -156,6 +196,7 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
>  		if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG
>  		    && ioapic->irr & (1 << index))
>  			ioapic_service(ioapic, index);
> +		ioapic_update_eoi_exitmap(ioapic->kvm);
>  		break;
>  	}
>  }
> diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
> index a30abfe..e8d67cf 100644
> --- a/virt/kvm/ioapic.h
> +++ b/virt/kvm/ioapic.h
> @@ -82,5 +82,6 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
>  		struct kvm_lapic_irq *irq);
>  int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
>  int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
> +void ioapic_update_eoi_exitmap(struct kvm *kvm);
>  
>  #endif
> diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
> index 656fa45..5455f5a 100644
> --- a/virt/kvm/irq_comm.c
> +++ b/virt/kvm/irq_comm.c
> @@ -237,6 +237,24 @@ int kvm_set_irq_inatomic(struct kvm *kvm, int irq_source_id, u32 irq, int level)
>  	return ret;
>  }
>  
> +bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin)
> +{
> +	struct kvm_irq_ack_notifier *kian;
> +	struct hlist_node *n;
> +	int gsi;
> +
> +	rcu_read_lock();
> +	gsi = rcu_dereference(kvm->irq_routing)->chip[irqchip][pin];
> +	if (gsi != -1)
> +		hlist_for_each_entry_rcu(kian, n, &kvm->irq_ack_notifier_list,
> +					 link)
> +			if (kian->gsi == gsi)
> +				return true;
> +	rcu_read_unlock();
> +
> +	return false;
> +}
> +
>  void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
>  {
>  	struct kvm_irq_ack_notifier *kian;
> @@ -261,6 +279,7 @@ void kvm_register_irq_ack_notifier(struct kvm *kvm,
>  	mutex_lock(&kvm->irq_lock);
>  	hlist_add_head_rcu(&kian->link, &kvm->irq_ack_notifier_list);
>  	mutex_unlock(&kvm->irq_lock);
> +	ioapic_update_eoi_exitmap(kvm);
>  }
>  
>  void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
> @@ -270,6 +289,7 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
>  	hlist_del_init_rcu(&kian->link);
>  	mutex_unlock(&kvm->irq_lock);
>  	synchronize_rcu();
> +	ioapic_update_eoi_exitmap(kvm);
>  }
>  
>  int kvm_request_irq_source_id(struct kvm *kvm)
> -- 
> 1.7.1

--
			Gleb.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
  2013-01-07  7:51   ` Gleb Natapov
@ 2013-01-07 13:04     ` Zhang, Yang Z
  0 siblings, 0 replies; 22+ messages in thread
From: Zhang, Yang Z @ 2013-01-07 13:04 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm@vger.kernel.org, Shan, Haitao, mtosatti@redhat.com,
	Tian, Kevin

Gleb Natapov wrote on 2013-01-07:
> On Mon, Jan 07, 2013 at 10:02:36AM +0800, Yang Zhang wrote:
>> From: Yang Zhang <yang.z.zhang@Intel.com>
>> 
>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
>> manually, which is fully taken care of by the hardware. This needs
>> some special awareness into existing interrupr injection path:
>> 
>> - for pending interrupt, instead of direct injection, we may need
>>   update architecture specific indicators before resuming to guest.
>> - A pending interrupt, which is masked by ISR, should be also
>>   considered in above update action, since hardware will decide
>>   when to inject it at right time. Current has_interrupt and
>>   get_interrupt only returns a valid vector from injection p.o.v.
>> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
>> Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
>> ---
>>  arch/ia64/kvm/lapic.h           |    6 ++
>>  arch/x86/include/asm/kvm_host.h |    8 ++ arch/x86/include/asm/vmx.h  
>>     |   11 +++ arch/x86/kvm/irq.c              |   56 +++++++++++-
>>  arch/x86/kvm/lapic.c            |   87 +++++++++++-------
>>  arch/x86/kvm/lapic.h            |   29 +++++- arch/x86/kvm/svm.c      
>>         |   36 ++++++++ arch/x86/kvm/vmx.c              |  190
>>  ++++++++++++++++++++++++++++++++++++++- arch/x86/kvm/x86.c            
>>   |   11 ++- include/linux/kvm_host.h        |    2 + virt/kvm/ioapic.c
>>                |   41 +++++++++ virt/kvm/ioapic.h               |    1
>>  + virt/kvm/irq_comm.c             |   20 ++++ 13 files changed, 451
>>  insertions(+), 47 deletions(-)
>> diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h
>> index c5f92a9..cb59eb4 100644
>> --- a/arch/ia64/kvm/lapic.h
>> +++ b/arch/ia64/kvm/lapic.h
>> @@ -27,4 +27,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct
> kvm_lapic_irq *irq);
>>  #define kvm_apic_present(x) (true)
>>  #define kvm_lapic_enabled(x) (true)
>> +static inline void kvm_update_eoi_exitmap(struct kvm *kvm,
>> +					struct kvm_lapic_irq *irq)
>> +{
>> +	/* IA64 has no apicv supporting, do nothing here */
>> +}
>> +
>>  #endif
>> diff --git a/arch/x86/include/asm/kvm_host.h
>> b/arch/x86/include/asm/kvm_host.h index c431b33..135603f 100644 ---
>> a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -697,6 +697,13 @@ struct kvm_x86_ops {
>>  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
>>  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
>>  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
>> +	int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
>> +	void (*update_apic_irq)(struct kvm_vcpu *vcpu, int max_irr);
>> +	void (*update_eoi_exitmap)(struct kvm *kvm, struct kvm_lapic_irq *irq);
>> +	void (*update_exitmap_start)(struct kvm_vcpu *vcpu);
>> +	void (*update_exitmap_end)(struct kvm_vcpu *vcpu);
>> +	void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu);
> The amount of callbacks to update exit bitmap start to become insane.
As your suggestion below, if using global lock, then three callbacks is enough

>> +	void (*restore_rvi)(struct kvm_vcpu *vcpu);
> rvi? Call it set_svi() and make it do just that - set svi.
Typo.

>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 0664c13..e1baf37 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -133,6 +133,12 @@ static inline int apic_enabled(struct kvm_lapic *apic)
>>  	return kvm_apic_sw_enabled(apic) &&	kvm_apic_hw_enabled(apic);
>>  }
>> +bool kvm_apic_present(struct kvm_vcpu *vcpu) +{ +	return
>> kvm_vcpu_has_lapic(vcpu) && kvm_apic_hw_enabled(vcpu->arch.apic); +}
>> +EXPORT_SYMBOL_GPL(kvm_apic_present); +
> Why is this change? Drop it.
I cannot remember why. But it seems this change is needless now.
 
>>  #define LVT_MASK	\
>>  	(APIC_LVT_MASKED | APIC_SEND_PENDING | APIC_VECTOR_MASK)
>> @@ -150,23 +156,6 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
>>  	return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
>>  }
>> -static inline u16 apic_cluster_id(struct kvm_apic_map *map, u32 ldr)
>> -{
>> -	u16 cid;
>> -	ldr >>= 32 - map->ldr_bits;
>> -	cid = (ldr >> map->cid_shift) & map->cid_mask;
>> -
>> -	BUG_ON(cid >= ARRAY_SIZE(map->logical_map));
>> -
>> -	return cid;
>> -}
>> -
>> -static inline u16 apic_logical_id(struct kvm_apic_map *map, u32 ldr)
>> -{
>> -	ldr >>= (32 - map->ldr_bits);
>> -	return ldr & map->lid_mask;
>> -}
>> -
>>  static void recalculate_apic_map(struct kvm *kvm)
>>  {
>>  	struct kvm_apic_map *new, *old = NULL;
>> @@ -236,12 +225,14 @@ static inline void kvm_apic_set_id(struct kvm_lapic
> *apic, u8 id)
>>  { 	apic_set_reg(apic, APIC_ID, id << 24);
>>  	recalculate_apic_map(apic->vcpu->kvm);
>>  +	ioapic_update_eoi_exitmap(apic->vcpu->kvm); }
>>  
>>  static inline void kvm_apic_set_ldr(struct kvm_lapic *apic, u32 id) {
>>  	apic_set_reg(apic, APIC_LDR, id);
>>  	recalculate_apic_map(apic->vcpu->kvm);
>>  +	ioapic_update_eoi_exitmap(apic->vcpu->kvm); }
>>  
>>  static inline int apic_lvt_enabled(struct kvm_lapic *apic, int lvt_type)
>> @@ -345,6 +336,9 @@ static inline int apic_find_highest_irr(struct kvm_lapic
> *apic)
>>  {
>>  	int result;
>> +	/* Note that irr_pending is just a hint. In the platform
>> +	 * that has virtual interrupt delivery supporting, virr
>> +	 * is cleared by hardware without updating irr_pending. */
> Just say that with vid enables irr_pending will be always true.
Sure.
 
>>  	if (!apic->irr_pending)
>>  		return -1;
>> @@ -458,9 +452,13 @@ static void pv_eoi_clr_pending(struct kvm_vcpu
> *vcpu)
>>  	__clear_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
>>  }
>> -static inline int apic_find_highest_isr(struct kvm_lapic *apic)
>> +int kvm_apic_find_highest_isr(struct kvm_lapic *apic)
>>  {
>>  	int result;
>> +
>> +	/* Note that isr_count is just a hint. In the platform
>> +	 * that has virtual interrupt delivery supporting, visr
>> +	 * will be set by hardware without updating irr_pending. */
> isr_count is not a hint! Without vid it has to be precise. Again just
> say that with vid enabled isr_count is always 1.
Ok.

>> @@ -1318,6 +1338,7 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64
> value)
>>  		else 			static_key_slow_inc(&apic_hw_disabled.key);
>>  		recalculate_apic_map(vcpu->kvm);
>>  +		ioapic_update_eoi_exitmap(apic->vcpu->kvm); 	}
>>  
>>  	if (!kvm_vcpu_is_bsp(apic->vcpu)) @@ -1375,7 +1396,7 @@ void
>>  kvm_lapic_reset(struct kvm_vcpu *vcpu) 		apic_set_reg(apic, APIC_TMR +
>>  0x10 * i, 0); 	} 	apic->irr_pending = false;
> irr_pending should be set to true if vid is enabled.
Yes, it must be true when vid is enabled.
 
>> @@ -433,6 +433,9 @@ struct vcpu_vmx {
>> 
>>  	bool rdtscp_enabled;
>> +	unsigned long eoi_exit_bitmap[4];
>> +	spinlock_t eoi_bitmap_lock;
> Why not mutex?
Sure. mutex is a better choice.

>> @@ -6099,6 +6138,142 @@ static void update_cr8_intercept(struct kvm_vcpu
> *vcpu, int tpr, int irr)
>>  	vmcs_write32(TPR_THRESHOLD, irr);
>>  }
>> +static int vmx_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
>> +{
>> +	return enable_apicv_reg_vid;
>> +}
>> +
>> +static void vmx_restore_rvi(struct kvm_vcpu *vcpu)
> This has nothing to do with rvi. Call it set_svi() pass it highest isr
> as a parameter. Drop apic_find_highest_isr() renaming.
Ok.
 
>> +{
>> +	int isr;
>> +	u16 status;
>> +	u8 old;
>> +
>> +	if (!enable_apicv_reg_vid)
>> +		return;
>> +
>> +	isr = kvm_apic_find_highest_isr(vcpu->arch.apic);
>> +	if (isr == -1)
>> +		return;
> If there is not isr set svi should be set to zero.
Right.

>> +
>> +	status = vmcs_read16(GUEST_INTR_STATUS);
>> +	old = status >> 8;
>> +	if (isr != old) {
>> +		status &= 0xff;
>> +		status |= isr << 8;
>> +		vmcs_write16(GUEST_INTR_STATUS, status);
>> +	}
>> +}
>> +
>> +static void vmx_update_rvi(int vector)
> set_rvi() for consistency.
Ok.
 
>> +{
>> +	u16 status;
>> +	u8 old;
>> +
>> +	status = vmcs_read16(GUEST_INTR_STATUS);
>> +	old = (u8)status & 0xff;
>> +	if ((u8)vector != old) {
>> +		status &= ~0xff;
>> +		status |= (u8)vector;
>> +		vmcs_write16(GUEST_INTR_STATUS, status);
>> +	}
>> +}
>> +
>> +static void vmx_update_apic_irq(struct kvm_vcpu *vcpu, int max_irr)
>> +{
>> +	if (max_irr == -1)
>> +		return;
>> +
>> +	vmx_update_rvi(max_irr);
>> +}
>> +
>> +static void set_eoi_exitmap_one(struct kvm_vcpu *vcpu,
>> +				u32 vector)
>> +{
>> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
>> +
>> +	if (!enable_apicv_reg_vid)
>> +		return;
>> +
>> +	if (WARN_ONCE((vector > 255),
>> +		"KVM VMX: vector (%d) out of range\n", vector))
>> +		return;
>> +
>> +	set_bit(vector, vmx->eoi_exit_bitmap);
> Since eoi_exit_bitmap is accessed under lock __set_bit() can be used
> here.
Right.

>> +
>> +	kvm_make_request(KVM_REQ_EOIBITMAP, vcpu);
>> +}
>> +
>> +void vmx_update_eoi_exitmap(struct kvm *kvm, struct kvm_lapic_irq *irq)
>> +{
>> +	struct kvm_vcpu *vcpu;
>> +	struct kvm_lapic **dst;
>> +	struct kvm_apic_map *map;
>> +	unsigned long bitmap = 1;
>> +	int i;
>> +
>> +	rcu_read_lock();
>> +	map = rcu_dereference(kvm->arch.apic_map);
>> +
>> +	if (unlikely(!map)) {
>> +		kvm_for_each_vcpu(i, vcpu, kvm)
>> +			set_eoi_exitmap_one(vcpu, irq->vector);
>> +		goto out;
>> +	}
>> +
>> +	if (irq->dest_mode == 0) { /* physical mode */
>> +		if (irq->delivery_mode == APIC_DM_LOWEST ||
>> +				irq->dest_id == 0xff) {
>> +			kvm_for_each_vcpu(i, vcpu, kvm)
>> +				set_eoi_exitmap_one(vcpu, irq->vector);
>> +			goto out;
>> +		}
>> +		dst = &map->phys_map[irq->dest_id & 0xff];
>> +	} else {
>> +		u32 mda = irq->dest_id << (32 - map->ldr_bits);
>> +
>> +		dst = map->logical_map[apic_cluster_id(map, mda)];
>> +
>> +		bitmap = apic_logical_id(map, mda);
>> +	}
>> +
>> +	for_each_set_bit(i, &bitmap, 16) {
>> +		if (!dst[i])
>> +			continue;
>> +		set_eoi_exitmap_one(dst[i]->vcpu, irq->vector);
>> +	}
>> +
>> +out:
>> +	rcu_read_unlock();
>> +}
>> +
>> +static void vmx_update_exitmap_start(struct kvm_vcpu *vcpu)
>> +{
>> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
>> +
>> +	memset(vmx->eoi_exit_bitmap, 0, 32);
>> +	spin_lock(&vmx->eoi_bitmap_lock);
> So you introduced the lock and access the thing that the lock should protect
> outside of it?
It's really a low-level mistake. I just copy it from other function and don't know why it will paste behind memset.
 
>> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
>> index f3abbef..4341ad3 100644
>> --- a/virt/kvm/ioapic.c
>> +++ b/virt/kvm/ioapic.c
>> @@ -115,6 +115,46 @@ static void update_handled_vectors(struct kvm_ioapic
> *ioapic)
>>  	smp_wmb();
>>  }
>> +static void ioapic_update_eoi_exitmap_one(struct kvm_ioapic *ioapic, int pin)
>> +{
>> +	union kvm_ioapic_redirect_entry *e;
>> +
>> +	e = &ioapic->redirtbl[pin];
>> +
>> +	if ((e->fields.trig_mode == IOAPIC_LEVEL_TRIG ||
>> +		 kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, pin))) {
>> +		struct kvm_lapic_irq irqe;
>> +
>> +		irqe.dest_id = e->fields.dest_id;
>> +		irqe.vector = e->fields.vector;
>> +		irqe.dest_mode = e->fields.dest_mode;
>> +		irqe.delivery_mode = e->fields.delivery_mode << 8;
>> +		kvm_x86_ops->update_eoi_exitmap(ioapic->kvm, &irqe);
>> +	}
>> +}
>> +
>> +void ioapic_update_eoi_exitmap(struct kvm *kvm)
>> +{
>> +	struct kvm_ioapic *ioapic = kvm->arch.vioapic;
>> +	union kvm_ioapic_redirect_entry *e;
>> +	int index;
>> +	struct kvm_vcpu *vcpu;
>> +
> The function should return right here if vid is not enabled.
Right.
 
>> +	kvm_for_each_vcpu(index, vcpu, kvm)
>> +		kvm_x86_ops->update_exitmap_start(vcpu);
>> +
>> +	for (index = 0; index < IOAPIC_NUM_PINS; index++) {
>> +		e = &ioapic->redirtbl[index];
>> +		if (!e->fields.mask)
>> +			ioapic_update_eoi_exitmap_one(ioapic, index);
>> +	}
>> +
>> +	kvm_for_each_vcpu(index, vcpu, kvm) {
>> +		kvm_x86_ops->update_exitmap_end(vcpu);
>> +		kvm_vcpu_kick(vcpu);
>> +	}
> Since you take all of the locks at the start of the bitmap update and release
> them at the end why not have global lock?
Right. Global lock is more reasonable.



Best regards,
Yang


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
  2013-01-07  2:02 ` [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support Yang Zhang
  2013-01-07  7:51   ` Gleb Natapov
@ 2013-01-07 13:52   ` Marcelo Tosatti
  2013-01-07 17:48     ` Gleb Natapov
  1 sibling, 1 reply; 22+ messages in thread
From: Marcelo Tosatti @ 2013-01-07 13:52 UTC (permalink / raw)
  To: Yang Zhang; +Cc: kvm, gleb, haitao.shan, Kevin Tian

On Mon, Jan 07, 2013 at 10:02:36AM +0800, Yang Zhang wrote:
> From: Yang Zhang <yang.z.zhang@Intel.com>
> 
> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> manually, which is fully taken care of by the hardware. This needs
> some special awareness into existing interrupr injection path:
> 
> - for pending interrupt, instead of direct injection, we may need
>   update architecture specific indicators before resuming to guest.
> 
> - A pending interrupt, which is masked by ISR, should be also
>   considered in above update action, since hardware will decide
>   when to inject it at right time. Current has_interrupt and
>   get_interrupt only returns a valid vector from injection p.o.v.
> 
> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
> ---
>  arch/ia64/kvm/lapic.h           |    6 ++
>  arch/x86/include/asm/kvm_host.h |    8 ++
>  arch/x86/include/asm/vmx.h      |   11 +++
>  arch/x86/kvm/irq.c              |   56 +++++++++++-
>  arch/x86/kvm/lapic.c            |   87 +++++++++++-------
>  arch/x86/kvm/lapic.h            |   29 +++++-
>  arch/x86/kvm/svm.c              |   36 ++++++++
>  arch/x86/kvm/vmx.c              |  190 ++++++++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/x86.c              |   11 ++-
>  include/linux/kvm_host.h        |    2 +
>  virt/kvm/ioapic.c               |   41 +++++++++
>  virt/kvm/ioapic.h               |    1 +
>  virt/kvm/irq_comm.c             |   20 ++++
>  13 files changed, 451 insertions(+), 47 deletions(-)
> 
> diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h
> index c5f92a9..cb59eb4 100644
> --- a/arch/ia64/kvm/lapic.h
> +++ b/arch/ia64/kvm/lapic.h
> @@ -27,4 +27,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);
>  #define kvm_apic_present(x) (true)
>  #define kvm_lapic_enabled(x) (true)
>  
> +static inline void kvm_update_eoi_exitmap(struct kvm *kvm,
> +					struct kvm_lapic_irq *irq)
> +{
> +	/* IA64 has no apicv supporting, do nothing here */
> +}
> +
>  #endif
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c431b33..135603f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -697,6 +697,13 @@ struct kvm_x86_ops {
>  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
>  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
>  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
> +	int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
> +	void (*update_apic_irq)(struct kvm_vcpu *vcpu, int max_irr);
> +	void (*update_eoi_exitmap)(struct kvm *kvm, struct kvm_lapic_irq *irq);
> +	void (*update_exitmap_start)(struct kvm_vcpu *vcpu);
> +	void (*update_exitmap_end)(struct kvm_vcpu *vcpu);
> +	void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu);
> +	void (*restore_rvi)(struct kvm_vcpu *vcpu);
>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>  	int (*get_tdp_level)(void);
>  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
> @@ -991,6 +998,7 @@ int kvm_age_hva(struct kvm *kvm, unsigned long hva);
>  int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
>  void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
>  int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
> +int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
>  int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
>  int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
>  int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index 44c3f7e..d1ab331 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -62,6 +62,7 @@
>  #define EXIT_REASON_MCE_DURING_VMENTRY  41
>  #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
>  #define EXIT_REASON_APIC_ACCESS         44
> +#define EXIT_REASON_EOI_INDUCED         45
>  #define EXIT_REASON_EPT_VIOLATION       48
>  #define EXIT_REASON_EPT_MISCONFIG       49
>  #define EXIT_REASON_WBINVD              54
> @@ -143,6 +144,7 @@
>  #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040
>  #define SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
>  #define SECONDARY_EXEC_APIC_REGISTER_VIRT       0x00000100
> +#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200
>  #define SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400
>  #define SECONDARY_EXEC_ENABLE_INVPCID		0x00001000
>  
> @@ -180,6 +182,7 @@ enum vmcs_field {
>  	GUEST_GS_SELECTOR               = 0x0000080a,
>  	GUEST_LDTR_SELECTOR             = 0x0000080c,
>  	GUEST_TR_SELECTOR               = 0x0000080e,
> +	GUEST_INTR_STATUS               = 0x00000810,
>  	HOST_ES_SELECTOR                = 0x00000c00,
>  	HOST_CS_SELECTOR                = 0x00000c02,
>  	HOST_SS_SELECTOR                = 0x00000c04,
> @@ -207,6 +210,14 @@ enum vmcs_field {
>  	APIC_ACCESS_ADDR_HIGH		= 0x00002015,
>  	EPT_POINTER                     = 0x0000201a,
>  	EPT_POINTER_HIGH                = 0x0000201b,
> +	EOI_EXIT_BITMAP0                = 0x0000201c,
> +	EOI_EXIT_BITMAP0_HIGH           = 0x0000201d,
> +	EOI_EXIT_BITMAP1                = 0x0000201e,
> +	EOI_EXIT_BITMAP1_HIGH           = 0x0000201f,
> +	EOI_EXIT_BITMAP2                = 0x00002020,
> +	EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
> +	EOI_EXIT_BITMAP3                = 0x00002022,
> +	EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
>  	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
>  	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
>  	VMCS_LINK_POINTER               = 0x00002800,
> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
> index b111aee..e113440 100644
> --- a/arch/x86/kvm/irq.c
> +++ b/arch/x86/kvm/irq.c
> @@ -38,6 +38,38 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
>  EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
>  
>  /*
> + * check if there is pending interrupt from
> + * non-APIC source without intack.
> + */
> +static int kvm_cpu_has_extint(struct kvm_vcpu *v)
> +{
> +	if (kvm_apic_accept_pic_intr(v))
> +		return pic_irqchip(v->kvm)->output;	/* PIC */
> +	else
> +		return 0;
> +}
> +
> +/*
> + * check if there is injectable interrupt:
> + * when virtual interrupt delivery enabled,
> + * interrupt from apic will handled by hardware,
> + * we don't need to check it here.
> + */
> +int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v)
> +{
> +	if (!irqchip_in_kernel(v->kvm))
> +		return v->arch.interrupt.pending;
> +
> +	if (kvm_cpu_has_extint(v))
> +		return 1;
> +
> +	if (kvm_apic_vid_enabled(v))
> +		return 0;
> +
> +	return kvm_apic_has_interrupt(v) != -1; /* LAPIC */
> +}
> +
> +/*
>   * check if there is pending interrupt without
>   * intack.
>   */
> @@ -46,27 +78,41 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
>  	if (!irqchip_in_kernel(v->kvm))
>  		return v->arch.interrupt.pending;
>  
> -	if (kvm_apic_accept_pic_intr(v) && pic_irqchip(v->kvm)->output)
> -		return pic_irqchip(v->kvm)->output;	/* PIC */
> +	if (kvm_cpu_has_extint(v))
> +		return 1;
>  
>  	return kvm_apic_has_interrupt(v) != -1;	/* LAPIC */
>  }
>  EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
>  
>  /*
> + * Read pending interrupt(from non-APIC source)
> + * vector and intack.
> + */
> +static int kvm_cpu_get_extint(struct kvm_vcpu *v)
> +{
> +	if (kvm_cpu_has_extint(v))
> +		return kvm_pic_read_irq(v->kvm); /* PIC */
> +	return -1;
> +}
> +
> +/*
>   * Read pending interrupt vector and intack.
>   */
>  int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
>  {
> +	int vector;
> +
>  	if (!irqchip_in_kernel(v->kvm))
>  		return v->arch.interrupt.nr;
>  
> -	if (kvm_apic_accept_pic_intr(v) && pic_irqchip(v->kvm)->output)
> -		return kvm_pic_read_irq(v->kvm);	/* PIC */
> +	vector = kvm_cpu_get_extint(v);
> +
> +	if (kvm_apic_vid_enabled(v) || vector != -1)
> +		return vector;			/* PIC */
>  
>  	return kvm_get_apic_interrupt(v);	/* APIC */
>  }
> -EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
>  
>  void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
>  {
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 0664c13..e1baf37 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -133,6 +133,12 @@ static inline int apic_enabled(struct kvm_lapic *apic)
>  	return kvm_apic_sw_enabled(apic) &&	kvm_apic_hw_enabled(apic);
>  }
>  
> +bool kvm_apic_present(struct kvm_vcpu *vcpu)
> +{
> +	return kvm_vcpu_has_lapic(vcpu) && kvm_apic_hw_enabled(vcpu->arch.apic);
> +}
> +EXPORT_SYMBOL_GPL(kvm_apic_present);
> +
>  #define LVT_MASK	\
>  	(APIC_LVT_MASKED | APIC_SEND_PENDING | APIC_VECTOR_MASK)
>  
> @@ -150,23 +156,6 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
>  	return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
>  }
>  
> -static inline u16 apic_cluster_id(struct kvm_apic_map *map, u32 ldr)
> -{
> -	u16 cid;
> -	ldr >>= 32 - map->ldr_bits;
> -	cid = (ldr >> map->cid_shift) & map->cid_mask;
> -
> -	BUG_ON(cid >= ARRAY_SIZE(map->logical_map));
> -
> -	return cid;
> -}
> -
> -static inline u16 apic_logical_id(struct kvm_apic_map *map, u32 ldr)
> -{
> -	ldr >>= (32 - map->ldr_bits);
> -	return ldr & map->lid_mask;
> -}
> -
>  static void recalculate_apic_map(struct kvm *kvm)
>  {
>  	struct kvm_apic_map *new, *old = NULL;
> @@ -236,12 +225,14 @@ static inline void kvm_apic_set_id(struct kvm_lapic *apic, u8 id)
>  {
>  	apic_set_reg(apic, APIC_ID, id << 24);
>  	recalculate_apic_map(apic->vcpu->kvm);
> +	ioapic_update_eoi_exitmap(apic->vcpu->kvm);
>  }
>  
>  static inline void kvm_apic_set_ldr(struct kvm_lapic *apic, u32 id)
>  {
>  	apic_set_reg(apic, APIC_LDR, id);
>  	recalculate_apic_map(apic->vcpu->kvm);
> +	ioapic_update_eoi_exitmap(apic->vcpu->kvm);
>  }
>  
>  static inline int apic_lvt_enabled(struct kvm_lapic *apic, int lvt_type)
> @@ -345,6 +336,9 @@ static inline int apic_find_highest_irr(struct kvm_lapic *apic)
>  {
>  	int result;
>  
> +	/* Note that irr_pending is just a hint. In the platform
> +	 * that has virtual interrupt delivery supporting, virr
> +	 * is cleared by hardware without updating irr_pending. */
>  	if (!apic->irr_pending)
>  		return -1;
>  
> @@ -458,9 +452,13 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
>  	__clear_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
>  }
>  
> -static inline int apic_find_highest_isr(struct kvm_lapic *apic)
> +int kvm_apic_find_highest_isr(struct kvm_lapic *apic)
>  {
>  	int result;
> +
> +	/* Note that isr_count is just a hint. In the platform
> +	 * that has virtual interrupt delivery supporting, visr
> +	 * will be set by hardware without updating irr_pending. */
>  	if (!apic->isr_count)
>  		return -1;
>  	if (likely(apic->highest_isr_cache != -1))
> @@ -471,6 +469,7 @@ static inline int apic_find_highest_isr(struct kvm_lapic *apic)
>  
>  	return result;
>  }
> +EXPORT_SYMBOL_GPL(kvm_apic_find_highest_isr);
>  
>  static void apic_update_ppr(struct kvm_lapic *apic)
>  {
> @@ -479,7 +478,7 @@ static void apic_update_ppr(struct kvm_lapic *apic)
>  
>  	old_ppr = kvm_apic_get_reg(apic, APIC_PROCPRI);
>  	tpr = kvm_apic_get_reg(apic, APIC_TASKPRI);
> -	isr = apic_find_highest_isr(apic);
> +	isr = kvm_apic_find_highest_isr(apic);
>  	isrv = (isr != -1) ? isr : 0;
>  
>  	if ((tpr & 0xf0) >= (isrv & 0xf0))
> @@ -740,9 +739,22 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
>  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
>  }
>  
> +static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
> +{
> +	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> +	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
> +		int trigger_mode;
> +		if (apic_test_vector(vector, apic->regs + APIC_TMR))
> +			trigger_mode = IOAPIC_LEVEL_TRIG;
> +		else
> +			trigger_mode = IOAPIC_EDGE_TRIG;
> +		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> +	}
> +}
> +
>  static int apic_set_eoi(struct kvm_lapic *apic)
>  {
> -	int vector = apic_find_highest_isr(apic);
> +	int vector = kvm_apic_find_highest_isr(apic);
>  
>  	trace_kvm_eoi(apic, vector);
>  
> @@ -756,19 +768,26 @@ static int apic_set_eoi(struct kvm_lapic *apic)
>  	apic_clear_isr(vector, apic);
>  	apic_update_ppr(apic);
>  
> -	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> -	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
> -		int trigger_mode;
> -		if (apic_test_vector(vector, apic->regs + APIC_TMR))
> -			trigger_mode = IOAPIC_LEVEL_TRIG;
> -		else
> -			trigger_mode = IOAPIC_EDGE_TRIG;
> -		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> -	}
> +	kvm_ioapic_send_eoi(apic, vector);
>  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>  	return vector;
>  }
>  
> +/*
> + * this interface assumes a trap-like exit, which has already finished
> + * desired side effect including vISR and vPPR update.
> + */
> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
> +{
> +	struct kvm_lapic *apic = vcpu->arch.apic;
> +
> +	trace_kvm_eoi(apic, vector);
> +
> +	kvm_ioapic_send_eoi(apic, vector);
> +	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> +}
> +EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
> +
>  static void apic_send_ipi(struct kvm_lapic *apic)
>  {
>  	u32 icr_low = kvm_apic_get_reg(apic, APIC_ICR);
> @@ -1071,6 +1090,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
>  		if (!apic_x2apic_mode(apic)) {
>  			apic_set_reg(apic, APIC_DFR, val | 0x0FFFFFFF);
>  			recalculate_apic_map(apic->vcpu->kvm);
> +			ioapic_update_eoi_exitmap(apic->vcpu->kvm);
>  		} else
>  			ret = 1;
>  		break;
> @@ -1318,6 +1338,7 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
>  		else
>  			static_key_slow_inc(&apic_hw_disabled.key);
>  		recalculate_apic_map(vcpu->kvm);
> +		ioapic_update_eoi_exitmap(apic->vcpu->kvm);
>  	}
>  
>  	if (!kvm_vcpu_is_bsp(apic->vcpu))
> @@ -1375,7 +1396,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
>  		apic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
>  	}
>  	apic->irr_pending = false;
> -	apic->isr_count = 0;
> +	apic->isr_count = kvm_apic_vid_enabled(vcpu);
>  	apic->highest_isr_cache = -1;
>  	update_divide_count(apic);
>  	atomic_set(&apic->lapic_timer.pending, 0);
> @@ -1590,8 +1611,10 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
>  	update_divide_count(apic);
>  	start_apic_timer(apic);
>  	apic->irr_pending = true;
> -	apic->isr_count = count_vectors(apic->regs + APIC_ISR);
> +	apic->isr_count = kvm_apic_vid_enabled(vcpu) ?
> +				1 : count_vectors(apic->regs + APIC_ISR);
>  	apic->highest_isr_cache = -1;
> +	kvm_x86_ops->restore_rvi(vcpu);
>  	kvm_make_request(KVM_REQ_EVENT, vcpu);
>  }
>  
> @@ -1704,7 +1727,7 @@ void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu)
>  	max_irr = apic_find_highest_irr(apic);
>  	if (max_irr < 0)
>  		max_irr = 0;
> -	max_isr = apic_find_highest_isr(apic);
> +	max_isr = kvm_apic_find_highest_isr(apic);
>  	if (max_isr < 0)
>  		max_isr = 0;
>  	data = (tpr & 0xff) | ((max_isr & 0xf0) << 8) | (max_irr << 24);
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 9a8ee22..52c66a5 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -39,6 +39,7 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu);
>  int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
>  int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu);
>  void kvm_lapic_reset(struct kvm_vcpu *vcpu);
>  u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
>  void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8);
> @@ -60,11 +61,13 @@ void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data);
>  void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
>  		struct kvm_lapic_state *s);
>  int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
> +int kvm_apic_find_highest_isr(struct kvm_lapic *apic);
>  
>  u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
>  void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
>  
>  void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector);
>  
>  void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr);
>  void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
> @@ -75,6 +78,7 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
>  
>  int kvm_hv_vapic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data);
>  int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
> +bool kvm_apic_present(struct kvm_vcpu *vcpu);
>  
>  static inline bool kvm_hv_vapic_assist_page_enabled(struct kvm_vcpu *vcpu)
>  {
> @@ -116,14 +120,31 @@ static inline int kvm_apic_sw_enabled(struct kvm_lapic *apic)
>  	return APIC_SPIV_APIC_ENABLED;
>  }
>  
> -static inline bool kvm_apic_present(struct kvm_vcpu *vcpu)
> +static inline int kvm_lapic_enabled(struct kvm_vcpu *vcpu)
>  {
> -	return kvm_vcpu_has_lapic(vcpu) && kvm_apic_hw_enabled(vcpu->arch.apic);
> +	return kvm_apic_present(vcpu) && kvm_apic_sw_enabled(vcpu->arch.apic);
>  }
>  
> -static inline int kvm_lapic_enabled(struct kvm_vcpu *vcpu)
> +static inline bool kvm_apic_vid_enabled(struct kvm_vcpu *vcpu)
>  {
> -	return kvm_apic_present(vcpu) && kvm_apic_sw_enabled(vcpu->arch.apic);
> +	return kvm_x86_ops->has_virtual_interrupt_delivery(vcpu);
> +}
> +
> +static inline u16 apic_cluster_id(struct kvm_apic_map *map, u32 ldr)
> +{
> +	u16 cid;
> +	ldr >>= 32 - map->ldr_bits;
> +	cid = (ldr >> map->cid_shift) & map->cid_mask;
> +
> +	BUG_ON(cid >= ARRAY_SIZE(map->logical_map));
> +
> +	return cid;
> +}
> +
> +static inline u16 apic_logical_id(struct kvm_apic_map *map, u32 ldr)
> +{
> +	ldr >>= (32 - map->ldr_bits);
> +	return ldr & map->lid_mask;
>  }
>  
>  #endif
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index d29d3cd..a8a8a4e 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -3571,6 +3571,36 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
>  		set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
>  }
>  
> +static int svm_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
> +{
> +	return 0;
> +}
> +
> +static void svm_update_eoi_exitmap(struct kvm *kvm, struct kvm_lapic_irq *irq)
> +{
> +	return;
> +}
> +
> +static void svm_update_exitmap_start(struct kvm_vcpu *vcpu)
> +{
> +	return;
> +}
> +
> +static void svm_update_exitmap_end(struct kvm_vcpu *vcpu)
> +{
> +	return;
> +}
> +
> +static void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu)
> +{
> +	return;
> +}
> +
> +static void svm_restore_rvi(struct kvm_vcpu *vcpu)
> +{
> +	return;
> +}
> +
>  static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -4290,6 +4320,12 @@ static struct kvm_x86_ops svm_x86_ops = {
>  	.enable_nmi_window = enable_nmi_window,
>  	.enable_irq_window = enable_irq_window,
>  	.update_cr8_intercept = update_cr8_intercept,
> +	.has_virtual_interrupt_delivery = svm_has_virtual_interrupt_delivery,
> +	.update_eoi_exitmap = svm_update_eoi_exitmap,
> +	.update_exitmap_start = svm_update_exitmap_start,
> +	.update_exitmap_end = svm_update_exitmap_end,
> +	.load_eoi_exitmap = svm_load_eoi_exitmap,
> +	.restore_rvi = svm_restore_rvi,
>  
>  	.set_tss_addr = svm_set_tss_addr,
>  	.get_tdp_level = get_npt_level,
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 730dc13..0c85c7e 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -433,6 +433,9 @@ struct vcpu_vmx {
>  
>  	bool rdtscp_enabled;
>  
> +	unsigned long eoi_exit_bitmap[4];
> +	spinlock_t eoi_bitmap_lock;
> +
>  	/* Support for a guest hypervisor (nested VMX) */
>  	struct nested_vmx nested;
>  };
> @@ -771,6 +774,12 @@ static inline bool cpu_has_vmx_apic_register_virt(void)
>  		SECONDARY_EXEC_APIC_REGISTER_VIRT;
>  }
>  
> +static inline bool cpu_has_vmx_virtual_intr_delivery(void)
> +{
> +	return vmcs_config.cpu_based_2nd_exec_ctrl &
> +		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
> +}
> +
>  static inline bool cpu_has_vmx_flexpriority(void)
>  {
>  	return cpu_has_vmx_tpr_shadow() &&
> @@ -2549,7 +2558,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>  			SECONDARY_EXEC_PAUSE_LOOP_EXITING |
>  			SECONDARY_EXEC_RDTSCP |
>  			SECONDARY_EXEC_ENABLE_INVPCID |
> -			SECONDARY_EXEC_APIC_REGISTER_VIRT;
> +			SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +			SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>  		if (adjust_vmx_controls(min2, opt2,
>  					MSR_IA32_VMX_PROCBASED_CTLS2,
>  					&_cpu_based_2nd_exec_control) < 0)
> @@ -2563,7 +2573,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>  
>  	if (!(_cpu_based_exec_control & CPU_BASED_TPR_SHADOW))
>  		_cpu_based_2nd_exec_control &= ~(
> -				SECONDARY_EXEC_APIC_REGISTER_VIRT);
> +				SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +				SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
>  
>  	if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) {
>  		/* CR3 accesses and invlpg don't need to cause VM Exits when EPT
> @@ -2762,9 +2773,15 @@ static __init int hardware_setup(void)
>  	if (!cpu_has_vmx_ple())
>  		ple_gap = 0;
>  
> -	if (!cpu_has_vmx_apic_register_virt())
> +	if (!cpu_has_vmx_apic_register_virt() ||
> +				!cpu_has_vmx_virtual_intr_delivery())
>  		enable_apicv_reg_vid = 0;
>  
> +	if (enable_apicv_reg_vid)
> +		kvm_x86_ops->update_cr8_intercept = NULL;
> +	else
> +		kvm_x86_ops->update_apic_irq = NULL;
> +
>  	if (nested)
>  		nested_vmx_setup_ctls_msrs();
>  
> @@ -3845,7 +3862,8 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
>  	if (!ple_gap)
>  		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
>  	if (!enable_apicv_reg_vid)
> -		exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
> +		exec_control &= ~(SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +				  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
>  	return exec_control;
>  }
>  
> @@ -3890,6 +3908,16 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
>  				vmx_secondary_exec_control(vmx));
>  	}
>  
> +	if (enable_apicv_reg_vid) {
> +		vmcs_write64(EOI_EXIT_BITMAP0, 0);
> +		vmcs_write64(EOI_EXIT_BITMAP1, 0);
> +		vmcs_write64(EOI_EXIT_BITMAP2, 0);
> +		vmcs_write64(EOI_EXIT_BITMAP3, 0);
> +		spin_lock_init(&vmx->eoi_bitmap_lock);
> +
> +		vmcs_write16(GUEST_INTR_STATUS, 0);
> +	}
> +
>  	if (ple_gap) {
>  		vmcs_write32(PLE_GAP, ple_gap);
>  		vmcs_write32(PLE_WINDOW, ple_window);
> @@ -4805,6 +4833,16 @@ static int handle_apic_access(struct kvm_vcpu *vcpu)
>  	return emulate_instruction(vcpu, 0) == EMULATE_DONE;
>  }
>  
> +static int handle_apic_eoi_induced(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> +	int vector = exit_qualification & 0xff;
> +
> +	/* EOI-induced VM exit is trap-like and thus no need to adjust IP */
> +	kvm_apic_set_eoi_accelerated(vcpu, vector);
> +	return 1;
> +}
> +
>  static int handle_apic_write(struct kvm_vcpu *vcpu)
>  {
>  	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> @@ -5750,6 +5788,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>  	[EXIT_REASON_TPR_BELOW_THRESHOLD]     = handle_tpr_below_threshold,
>  	[EXIT_REASON_APIC_ACCESS]             = handle_apic_access,
>  	[EXIT_REASON_APIC_WRITE]              = handle_apic_write,
> +	[EXIT_REASON_EOI_INDUCED]             = handle_apic_eoi_induced,
>  	[EXIT_REASON_WBINVD]                  = handle_wbinvd,
>  	[EXIT_REASON_XSETBV]                  = handle_xsetbv,
>  	[EXIT_REASON_TASK_SWITCH]             = handle_task_switch,
> @@ -6099,6 +6138,142 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
>  	vmcs_write32(TPR_THRESHOLD, irr);
>  }
>  
> +static int vmx_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
> +{
> +	return enable_apicv_reg_vid;
> +}
> +
> +static void vmx_restore_rvi(struct kvm_vcpu *vcpu)
> +{
> +	int isr;
> +	u16 status;
> +	u8 old;
> +
> +	if (!enable_apicv_reg_vid)
> +		return;
> +
> +	isr = kvm_apic_find_highest_isr(vcpu->arch.apic);
> +	if (isr == -1)
> +		return;
> +
> +	status = vmcs_read16(GUEST_INTR_STATUS);
> +	old = status >> 8;
> +	if (isr != old) {
> +		status &= 0xff;
> +		status |= isr << 8;
> +		vmcs_write16(GUEST_INTR_STATUS, status);
> +	}
> +}
> +
> +static void vmx_update_rvi(int vector)
> +{
> +	u16 status;
> +	u8 old;
> +
> +	status = vmcs_read16(GUEST_INTR_STATUS);
> +	old = (u8)status & 0xff;
> +	if ((u8)vector != old) {
> +		status &= ~0xff;
> +		status |= (u8)vector;
> +		vmcs_write16(GUEST_INTR_STATUS, status);
> +	}
> +}
> +
> +static void vmx_update_apic_irq(struct kvm_vcpu *vcpu, int max_irr)
> +{
> +	if (max_irr == -1)
> +		return;
> +
> +	vmx_update_rvi(max_irr);
> +}
> +
> +static void set_eoi_exitmap_one(struct kvm_vcpu *vcpu,
> +				u32 vector)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	if (!enable_apicv_reg_vid)
> +		return;
> +
> +	if (WARN_ONCE((vector > 255),
> +		"KVM VMX: vector (%d) out of range\n", vector))
> +		return;
> +
> +	set_bit(vector, vmx->eoi_exit_bitmap);
> +
> +	kvm_make_request(KVM_REQ_EOIBITMAP, vcpu);
> +}
> +
> +void vmx_update_eoi_exitmap(struct kvm *kvm, struct kvm_lapic_irq *irq)
> +{
> +	struct kvm_vcpu *vcpu;
> +	struct kvm_lapic **dst;
> +	struct kvm_apic_map *map;
> +	unsigned long bitmap = 1;
> +	int i;
> +
> +	rcu_read_lock();
> +	map = rcu_dereference(kvm->arch.apic_map);
> +
> +	if (unlikely(!map)) {
> +		kvm_for_each_vcpu(i, vcpu, kvm)
> +			set_eoi_exitmap_one(vcpu, irq->vector);
> +		goto out;
> +	}

The suggestion was


ioapic_write (or any other ioapic update)
lock()
perform update
make_all_vcpus_request(KVM_REQ_UPDATE_EOI_BITMAP) (*)
unlock()

(*) Similarly to TLB flush.

The advantage is that all work becomes vcpu local. The end result
is much simpler code.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
  2013-01-07 13:52   ` Marcelo Tosatti
@ 2013-01-07 17:48     ` Gleb Natapov
  2013-01-07 21:32       ` Marcelo Tosatti
  0 siblings, 1 reply; 22+ messages in thread
From: Gleb Natapov @ 2013-01-07 17:48 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Yang Zhang, kvm, haitao.shan, Kevin Tian

On Mon, Jan 07, 2013 at 11:52:21AM -0200, Marcelo Tosatti wrote:
> On Mon, Jan 07, 2013 at 10:02:36AM +0800, Yang Zhang wrote:
> > From: Yang Zhang <yang.z.zhang@Intel.com>
> > 
> > Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> > manually, which is fully taken care of by the hardware. This needs
> > some special awareness into existing interrupr injection path:
> > 
> > - for pending interrupt, instead of direct injection, we may need
> >   update architecture specific indicators before resuming to guest.
> > 
> > - A pending interrupt, which is masked by ISR, should be also
> >   considered in above update action, since hardware will decide
> >   when to inject it at right time. Current has_interrupt and
> >   get_interrupt only returns a valid vector from injection p.o.v.
> > 
> > Signed-off-by: Kevin Tian <kevin.tian@intel.com>
> > Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
> > ---
> >  arch/ia64/kvm/lapic.h           |    6 ++
> >  arch/x86/include/asm/kvm_host.h |    8 ++
> >  arch/x86/include/asm/vmx.h      |   11 +++
> >  arch/x86/kvm/irq.c              |   56 +++++++++++-
> >  arch/x86/kvm/lapic.c            |   87 +++++++++++-------
> >  arch/x86/kvm/lapic.h            |   29 +++++-
> >  arch/x86/kvm/svm.c              |   36 ++++++++
> >  arch/x86/kvm/vmx.c              |  190 ++++++++++++++++++++++++++++++++++++++-
> >  arch/x86/kvm/x86.c              |   11 ++-
> >  include/linux/kvm_host.h        |    2 +
> >  virt/kvm/ioapic.c               |   41 +++++++++
> >  virt/kvm/ioapic.h               |    1 +
> >  virt/kvm/irq_comm.c             |   20 ++++
> >  13 files changed, 451 insertions(+), 47 deletions(-)
> > 
> > diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h
> > index c5f92a9..cb59eb4 100644
> > --- a/arch/ia64/kvm/lapic.h
> > +++ b/arch/ia64/kvm/lapic.h
> > @@ -27,4 +27,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);
> >  #define kvm_apic_present(x) (true)
> >  #define kvm_lapic_enabled(x) (true)
> >  
> > +static inline void kvm_update_eoi_exitmap(struct kvm *kvm,
> > +					struct kvm_lapic_irq *irq)
> > +{
> > +	/* IA64 has no apicv supporting, do nothing here */
> > +}
> > +
> >  #endif
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index c431b33..135603f 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -697,6 +697,13 @@ struct kvm_x86_ops {
> >  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
> >  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
> >  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
> > +	int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
> > +	void (*update_apic_irq)(struct kvm_vcpu *vcpu, int max_irr);
> > +	void (*update_eoi_exitmap)(struct kvm *kvm, struct kvm_lapic_irq *irq);
> > +	void (*update_exitmap_start)(struct kvm_vcpu *vcpu);
> > +	void (*update_exitmap_end)(struct kvm_vcpu *vcpu);
> > +	void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu);
> > +	void (*restore_rvi)(struct kvm_vcpu *vcpu);
> >  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
> >  	int (*get_tdp_level)(void);
> >  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
> > @@ -991,6 +998,7 @@ int kvm_age_hva(struct kvm *kvm, unsigned long hva);
> >  int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
> >  void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
> >  int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
> > +int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
> >  int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
> >  int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
> >  int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
> > diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> > index 44c3f7e..d1ab331 100644
> > --- a/arch/x86/include/asm/vmx.h
> > +++ b/arch/x86/include/asm/vmx.h
> > @@ -62,6 +62,7 @@
> >  #define EXIT_REASON_MCE_DURING_VMENTRY  41
> >  #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
> >  #define EXIT_REASON_APIC_ACCESS         44
> > +#define EXIT_REASON_EOI_INDUCED         45
> >  #define EXIT_REASON_EPT_VIOLATION       48
> >  #define EXIT_REASON_EPT_MISCONFIG       49
> >  #define EXIT_REASON_WBINVD              54
> > @@ -143,6 +144,7 @@
> >  #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040
> >  #define SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
> >  #define SECONDARY_EXEC_APIC_REGISTER_VIRT       0x00000100
> > +#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200
> >  #define SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400
> >  #define SECONDARY_EXEC_ENABLE_INVPCID		0x00001000
> >  
> > @@ -180,6 +182,7 @@ enum vmcs_field {
> >  	GUEST_GS_SELECTOR               = 0x0000080a,
> >  	GUEST_LDTR_SELECTOR             = 0x0000080c,
> >  	GUEST_TR_SELECTOR               = 0x0000080e,
> > +	GUEST_INTR_STATUS               = 0x00000810,
> >  	HOST_ES_SELECTOR                = 0x00000c00,
> >  	HOST_CS_SELECTOR                = 0x00000c02,
> >  	HOST_SS_SELECTOR                = 0x00000c04,
> > @@ -207,6 +210,14 @@ enum vmcs_field {
> >  	APIC_ACCESS_ADDR_HIGH		= 0x00002015,
> >  	EPT_POINTER                     = 0x0000201a,
> >  	EPT_POINTER_HIGH                = 0x0000201b,
> > +	EOI_EXIT_BITMAP0                = 0x0000201c,
> > +	EOI_EXIT_BITMAP0_HIGH           = 0x0000201d,
> > +	EOI_EXIT_BITMAP1                = 0x0000201e,
> > +	EOI_EXIT_BITMAP1_HIGH           = 0x0000201f,
> > +	EOI_EXIT_BITMAP2                = 0x00002020,
> > +	EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
> > +	EOI_EXIT_BITMAP3                = 0x00002022,
> > +	EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
> >  	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
> >  	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
> >  	VMCS_LINK_POINTER               = 0x00002800,
> > diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
> > index b111aee..e113440 100644
> > --- a/arch/x86/kvm/irq.c
> > +++ b/arch/x86/kvm/irq.c
> > @@ -38,6 +38,38 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
> >  EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
> >  
> >  /*
> > + * check if there is pending interrupt from
> > + * non-APIC source without intack.
> > + */
> > +static int kvm_cpu_has_extint(struct kvm_vcpu *v)
> > +{
> > +	if (kvm_apic_accept_pic_intr(v))
> > +		return pic_irqchip(v->kvm)->output;	/* PIC */
> > +	else
> > +		return 0;
> > +}
> > +
> > +/*
> > + * check if there is injectable interrupt:
> > + * when virtual interrupt delivery enabled,
> > + * interrupt from apic will handled by hardware,
> > + * we don't need to check it here.
> > + */
> > +int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v)
> > +{
> > +	if (!irqchip_in_kernel(v->kvm))
> > +		return v->arch.interrupt.pending;
> > +
> > +	if (kvm_cpu_has_extint(v))
> > +		return 1;
> > +
> > +	if (kvm_apic_vid_enabled(v))
> > +		return 0;
> > +
> > +	return kvm_apic_has_interrupt(v) != -1; /* LAPIC */
> > +}
> > +
> > +/*
> >   * check if there is pending interrupt without
> >   * intack.
> >   */
> > @@ -46,27 +78,41 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
> >  	if (!irqchip_in_kernel(v->kvm))
> >  		return v->arch.interrupt.pending;
> >  
> > -	if (kvm_apic_accept_pic_intr(v) && pic_irqchip(v->kvm)->output)
> > -		return pic_irqchip(v->kvm)->output;	/* PIC */
> > +	if (kvm_cpu_has_extint(v))
> > +		return 1;
> >  
> >  	return kvm_apic_has_interrupt(v) != -1;	/* LAPIC */
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
> >  
> >  /*
> > + * Read pending interrupt(from non-APIC source)
> > + * vector and intack.
> > + */
> > +static int kvm_cpu_get_extint(struct kvm_vcpu *v)
> > +{
> > +	if (kvm_cpu_has_extint(v))
> > +		return kvm_pic_read_irq(v->kvm); /* PIC */
> > +	return -1;
> > +}
> > +
> > +/*
> >   * Read pending interrupt vector and intack.
> >   */
> >  int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
> >  {
> > +	int vector;
> > +
> >  	if (!irqchip_in_kernel(v->kvm))
> >  		return v->arch.interrupt.nr;
> >  
> > -	if (kvm_apic_accept_pic_intr(v) && pic_irqchip(v->kvm)->output)
> > -		return kvm_pic_read_irq(v->kvm);	/* PIC */
> > +	vector = kvm_cpu_get_extint(v);
> > +
> > +	if (kvm_apic_vid_enabled(v) || vector != -1)
> > +		return vector;			/* PIC */
> >  
> >  	return kvm_get_apic_interrupt(v);	/* APIC */
> >  }
> > -EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
> >  
> >  void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
> >  {
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 0664c13..e1baf37 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -133,6 +133,12 @@ static inline int apic_enabled(struct kvm_lapic *apic)
> >  	return kvm_apic_sw_enabled(apic) &&	kvm_apic_hw_enabled(apic);
> >  }
> >  
> > +bool kvm_apic_present(struct kvm_vcpu *vcpu)
> > +{
> > +	return kvm_vcpu_has_lapic(vcpu) && kvm_apic_hw_enabled(vcpu->arch.apic);
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_apic_present);
> > +
> >  #define LVT_MASK	\
> >  	(APIC_LVT_MASKED | APIC_SEND_PENDING | APIC_VECTOR_MASK)
> >  
> > @@ -150,23 +156,6 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
> >  	return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
> >  }
> >  
> > -static inline u16 apic_cluster_id(struct kvm_apic_map *map, u32 ldr)
> > -{
> > -	u16 cid;
> > -	ldr >>= 32 - map->ldr_bits;
> > -	cid = (ldr >> map->cid_shift) & map->cid_mask;
> > -
> > -	BUG_ON(cid >= ARRAY_SIZE(map->logical_map));
> > -
> > -	return cid;
> > -}
> > -
> > -static inline u16 apic_logical_id(struct kvm_apic_map *map, u32 ldr)
> > -{
> > -	ldr >>= (32 - map->ldr_bits);
> > -	return ldr & map->lid_mask;
> > -}
> > -
> >  static void recalculate_apic_map(struct kvm *kvm)
> >  {
> >  	struct kvm_apic_map *new, *old = NULL;
> > @@ -236,12 +225,14 @@ static inline void kvm_apic_set_id(struct kvm_lapic *apic, u8 id)
> >  {
> >  	apic_set_reg(apic, APIC_ID, id << 24);
> >  	recalculate_apic_map(apic->vcpu->kvm);
> > +	ioapic_update_eoi_exitmap(apic->vcpu->kvm);
> >  }
> >  
> >  static inline void kvm_apic_set_ldr(struct kvm_lapic *apic, u32 id)
> >  {
> >  	apic_set_reg(apic, APIC_LDR, id);
> >  	recalculate_apic_map(apic->vcpu->kvm);
> > +	ioapic_update_eoi_exitmap(apic->vcpu->kvm);
> >  }
> >  
> >  static inline int apic_lvt_enabled(struct kvm_lapic *apic, int lvt_type)
> > @@ -345,6 +336,9 @@ static inline int apic_find_highest_irr(struct kvm_lapic *apic)
> >  {
> >  	int result;
> >  
> > +	/* Note that irr_pending is just a hint. In the platform
> > +	 * that has virtual interrupt delivery supporting, virr
> > +	 * is cleared by hardware without updating irr_pending. */
> >  	if (!apic->irr_pending)
> >  		return -1;
> >  
> > @@ -458,9 +452,13 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
> >  	__clear_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
> >  }
> >  
> > -static inline int apic_find_highest_isr(struct kvm_lapic *apic)
> > +int kvm_apic_find_highest_isr(struct kvm_lapic *apic)
> >  {
> >  	int result;
> > +
> > +	/* Note that isr_count is just a hint. In the platform
> > +	 * that has virtual interrupt delivery supporting, visr
> > +	 * will be set by hardware without updating irr_pending. */
> >  	if (!apic->isr_count)
> >  		return -1;
> >  	if (likely(apic->highest_isr_cache != -1))
> > @@ -471,6 +469,7 @@ static inline int apic_find_highest_isr(struct kvm_lapic *apic)
> >  
> >  	return result;
> >  }
> > +EXPORT_SYMBOL_GPL(kvm_apic_find_highest_isr);
> >  
> >  static void apic_update_ppr(struct kvm_lapic *apic)
> >  {
> > @@ -479,7 +478,7 @@ static void apic_update_ppr(struct kvm_lapic *apic)
> >  
> >  	old_ppr = kvm_apic_get_reg(apic, APIC_PROCPRI);
> >  	tpr = kvm_apic_get_reg(apic, APIC_TASKPRI);
> > -	isr = apic_find_highest_isr(apic);
> > +	isr = kvm_apic_find_highest_isr(apic);
> >  	isrv = (isr != -1) ? isr : 0;
> >  
> >  	if ((tpr & 0xf0) >= (isrv & 0xf0))
> > @@ -740,9 +739,22 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
> >  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
> >  }
> >  
> > +static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
> > +{
> > +	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> > +	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
> > +		int trigger_mode;
> > +		if (apic_test_vector(vector, apic->regs + APIC_TMR))
> > +			trigger_mode = IOAPIC_LEVEL_TRIG;
> > +		else
> > +			trigger_mode = IOAPIC_EDGE_TRIG;
> > +		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> > +	}
> > +}
> > +
> >  static int apic_set_eoi(struct kvm_lapic *apic)
> >  {
> > -	int vector = apic_find_highest_isr(apic);
> > +	int vector = kvm_apic_find_highest_isr(apic);
> >  
> >  	trace_kvm_eoi(apic, vector);
> >  
> > @@ -756,19 +768,26 @@ static int apic_set_eoi(struct kvm_lapic *apic)
> >  	apic_clear_isr(vector, apic);
> >  	apic_update_ppr(apic);
> >  
> > -	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> > -	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
> > -		int trigger_mode;
> > -		if (apic_test_vector(vector, apic->regs + APIC_TMR))
> > -			trigger_mode = IOAPIC_LEVEL_TRIG;
> > -		else
> > -			trigger_mode = IOAPIC_EDGE_TRIG;
> > -		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> > -	}
> > +	kvm_ioapic_send_eoi(apic, vector);
> >  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> >  	return vector;
> >  }
> >  
> > +/*
> > + * this interface assumes a trap-like exit, which has already finished
> > + * desired side effect including vISR and vPPR update.
> > + */
> > +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
> > +{
> > +	struct kvm_lapic *apic = vcpu->arch.apic;
> > +
> > +	trace_kvm_eoi(apic, vector);
> > +
> > +	kvm_ioapic_send_eoi(apic, vector);
> > +	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
> > +
> >  static void apic_send_ipi(struct kvm_lapic *apic)
> >  {
> >  	u32 icr_low = kvm_apic_get_reg(apic, APIC_ICR);
> > @@ -1071,6 +1090,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
> >  		if (!apic_x2apic_mode(apic)) {
> >  			apic_set_reg(apic, APIC_DFR, val | 0x0FFFFFFF);
> >  			recalculate_apic_map(apic->vcpu->kvm);
> > +			ioapic_update_eoi_exitmap(apic->vcpu->kvm);
> >  		} else
> >  			ret = 1;
> >  		break;
> > @@ -1318,6 +1338,7 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
> >  		else
> >  			static_key_slow_inc(&apic_hw_disabled.key);
> >  		recalculate_apic_map(vcpu->kvm);
> > +		ioapic_update_eoi_exitmap(apic->vcpu->kvm);
> >  	}
> >  
> >  	if (!kvm_vcpu_is_bsp(apic->vcpu))
> > @@ -1375,7 +1396,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
> >  		apic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
> >  	}
> >  	apic->irr_pending = false;
> > -	apic->isr_count = 0;
> > +	apic->isr_count = kvm_apic_vid_enabled(vcpu);
> >  	apic->highest_isr_cache = -1;
> >  	update_divide_count(apic);
> >  	atomic_set(&apic->lapic_timer.pending, 0);
> > @@ -1590,8 +1611,10 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
> >  	update_divide_count(apic);
> >  	start_apic_timer(apic);
> >  	apic->irr_pending = true;
> > -	apic->isr_count = count_vectors(apic->regs + APIC_ISR);
> > +	apic->isr_count = kvm_apic_vid_enabled(vcpu) ?
> > +				1 : count_vectors(apic->regs + APIC_ISR);
> >  	apic->highest_isr_cache = -1;
> > +	kvm_x86_ops->restore_rvi(vcpu);
> >  	kvm_make_request(KVM_REQ_EVENT, vcpu);
> >  }
> >  
> > @@ -1704,7 +1727,7 @@ void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu)
> >  	max_irr = apic_find_highest_irr(apic);
> >  	if (max_irr < 0)
> >  		max_irr = 0;
> > -	max_isr = apic_find_highest_isr(apic);
> > +	max_isr = kvm_apic_find_highest_isr(apic);
> >  	if (max_isr < 0)
> >  		max_isr = 0;
> >  	data = (tpr & 0xff) | ((max_isr & 0xf0) << 8) | (max_irr << 24);
> > diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> > index 9a8ee22..52c66a5 100644
> > --- a/arch/x86/kvm/lapic.h
> > +++ b/arch/x86/kvm/lapic.h
> > @@ -39,6 +39,7 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu);
> >  int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
> >  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
> >  int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
> > +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu);
> >  void kvm_lapic_reset(struct kvm_vcpu *vcpu);
> >  u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
> >  void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8);
> > @@ -60,11 +61,13 @@ void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data);
> >  void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
> >  		struct kvm_lapic_state *s);
> >  int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
> > +int kvm_apic_find_highest_isr(struct kvm_lapic *apic);
> >  
> >  u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
> >  void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
> >  
> >  void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
> > +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector);
> >  
> >  void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr);
> >  void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
> > @@ -75,6 +78,7 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
> >  
> >  int kvm_hv_vapic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data);
> >  int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
> > +bool kvm_apic_present(struct kvm_vcpu *vcpu);
> >  
> >  static inline bool kvm_hv_vapic_assist_page_enabled(struct kvm_vcpu *vcpu)
> >  {
> > @@ -116,14 +120,31 @@ static inline int kvm_apic_sw_enabled(struct kvm_lapic *apic)
> >  	return APIC_SPIV_APIC_ENABLED;
> >  }
> >  
> > -static inline bool kvm_apic_present(struct kvm_vcpu *vcpu)
> > +static inline int kvm_lapic_enabled(struct kvm_vcpu *vcpu)
> >  {
> > -	return kvm_vcpu_has_lapic(vcpu) && kvm_apic_hw_enabled(vcpu->arch.apic);
> > +	return kvm_apic_present(vcpu) && kvm_apic_sw_enabled(vcpu->arch.apic);
> >  }
> >  
> > -static inline int kvm_lapic_enabled(struct kvm_vcpu *vcpu)
> > +static inline bool kvm_apic_vid_enabled(struct kvm_vcpu *vcpu)
> >  {
> > -	return kvm_apic_present(vcpu) && kvm_apic_sw_enabled(vcpu->arch.apic);
> > +	return kvm_x86_ops->has_virtual_interrupt_delivery(vcpu);
> > +}
> > +
> > +static inline u16 apic_cluster_id(struct kvm_apic_map *map, u32 ldr)
> > +{
> > +	u16 cid;
> > +	ldr >>= 32 - map->ldr_bits;
> > +	cid = (ldr >> map->cid_shift) & map->cid_mask;
> > +
> > +	BUG_ON(cid >= ARRAY_SIZE(map->logical_map));
> > +
> > +	return cid;
> > +}
> > +
> > +static inline u16 apic_logical_id(struct kvm_apic_map *map, u32 ldr)
> > +{
> > +	ldr >>= (32 - map->ldr_bits);
> > +	return ldr & map->lid_mask;
> >  }
> >  
> >  #endif
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index d29d3cd..a8a8a4e 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -3571,6 +3571,36 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
> >  		set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
> >  }
> >  
> > +static int svm_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
> > +{
> > +	return 0;
> > +}
> > +
> > +static void svm_update_eoi_exitmap(struct kvm *kvm, struct kvm_lapic_irq *irq)
> > +{
> > +	return;
> > +}
> > +
> > +static void svm_update_exitmap_start(struct kvm_vcpu *vcpu)
> > +{
> > +	return;
> > +}
> > +
> > +static void svm_update_exitmap_end(struct kvm_vcpu *vcpu)
> > +{
> > +	return;
> > +}
> > +
> > +static void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu)
> > +{
> > +	return;
> > +}
> > +
> > +static void svm_restore_rvi(struct kvm_vcpu *vcpu)
> > +{
> > +	return;
> > +}
> > +
> >  static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
> >  {
> >  	struct vcpu_svm *svm = to_svm(vcpu);
> > @@ -4290,6 +4320,12 @@ static struct kvm_x86_ops svm_x86_ops = {
> >  	.enable_nmi_window = enable_nmi_window,
> >  	.enable_irq_window = enable_irq_window,
> >  	.update_cr8_intercept = update_cr8_intercept,
> > +	.has_virtual_interrupt_delivery = svm_has_virtual_interrupt_delivery,
> > +	.update_eoi_exitmap = svm_update_eoi_exitmap,
> > +	.update_exitmap_start = svm_update_exitmap_start,
> > +	.update_exitmap_end = svm_update_exitmap_end,
> > +	.load_eoi_exitmap = svm_load_eoi_exitmap,
> > +	.restore_rvi = svm_restore_rvi,
> >  
> >  	.set_tss_addr = svm_set_tss_addr,
> >  	.get_tdp_level = get_npt_level,
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 730dc13..0c85c7e 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -433,6 +433,9 @@ struct vcpu_vmx {
> >  
> >  	bool rdtscp_enabled;
> >  
> > +	unsigned long eoi_exit_bitmap[4];
> > +	spinlock_t eoi_bitmap_lock;
> > +
> >  	/* Support for a guest hypervisor (nested VMX) */
> >  	struct nested_vmx nested;
> >  };
> > @@ -771,6 +774,12 @@ static inline bool cpu_has_vmx_apic_register_virt(void)
> >  		SECONDARY_EXEC_APIC_REGISTER_VIRT;
> >  }
> >  
> > +static inline bool cpu_has_vmx_virtual_intr_delivery(void)
> > +{
> > +	return vmcs_config.cpu_based_2nd_exec_ctrl &
> > +		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
> > +}
> > +
> >  static inline bool cpu_has_vmx_flexpriority(void)
> >  {
> >  	return cpu_has_vmx_tpr_shadow() &&
> > @@ -2549,7 +2558,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
> >  			SECONDARY_EXEC_PAUSE_LOOP_EXITING |
> >  			SECONDARY_EXEC_RDTSCP |
> >  			SECONDARY_EXEC_ENABLE_INVPCID |
> > -			SECONDARY_EXEC_APIC_REGISTER_VIRT;
> > +			SECONDARY_EXEC_APIC_REGISTER_VIRT |
> > +			SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
> >  		if (adjust_vmx_controls(min2, opt2,
> >  					MSR_IA32_VMX_PROCBASED_CTLS2,
> >  					&_cpu_based_2nd_exec_control) < 0)
> > @@ -2563,7 +2573,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
> >  
> >  	if (!(_cpu_based_exec_control & CPU_BASED_TPR_SHADOW))
> >  		_cpu_based_2nd_exec_control &= ~(
> > -				SECONDARY_EXEC_APIC_REGISTER_VIRT);
> > +				SECONDARY_EXEC_APIC_REGISTER_VIRT |
> > +				SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
> >  
> >  	if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) {
> >  		/* CR3 accesses and invlpg don't need to cause VM Exits when EPT
> > @@ -2762,9 +2773,15 @@ static __init int hardware_setup(void)
> >  	if (!cpu_has_vmx_ple())
> >  		ple_gap = 0;
> >  
> > -	if (!cpu_has_vmx_apic_register_virt())
> > +	if (!cpu_has_vmx_apic_register_virt() ||
> > +				!cpu_has_vmx_virtual_intr_delivery())
> >  		enable_apicv_reg_vid = 0;
> >  
> > +	if (enable_apicv_reg_vid)
> > +		kvm_x86_ops->update_cr8_intercept = NULL;
> > +	else
> > +		kvm_x86_ops->update_apic_irq = NULL;
> > +
> >  	if (nested)
> >  		nested_vmx_setup_ctls_msrs();
> >  
> > @@ -3845,7 +3862,8 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
> >  	if (!ple_gap)
> >  		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
> >  	if (!enable_apicv_reg_vid)
> > -		exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
> > +		exec_control &= ~(SECONDARY_EXEC_APIC_REGISTER_VIRT |
> > +				  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
> >  	return exec_control;
> >  }
> >  
> > @@ -3890,6 +3908,16 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
> >  				vmx_secondary_exec_control(vmx));
> >  	}
> >  
> > +	if (enable_apicv_reg_vid) {
> > +		vmcs_write64(EOI_EXIT_BITMAP0, 0);
> > +		vmcs_write64(EOI_EXIT_BITMAP1, 0);
> > +		vmcs_write64(EOI_EXIT_BITMAP2, 0);
> > +		vmcs_write64(EOI_EXIT_BITMAP3, 0);
> > +		spin_lock_init(&vmx->eoi_bitmap_lock);
> > +
> > +		vmcs_write16(GUEST_INTR_STATUS, 0);
> > +	}
> > +
> >  	if (ple_gap) {
> >  		vmcs_write32(PLE_GAP, ple_gap);
> >  		vmcs_write32(PLE_WINDOW, ple_window);
> > @@ -4805,6 +4833,16 @@ static int handle_apic_access(struct kvm_vcpu *vcpu)
> >  	return emulate_instruction(vcpu, 0) == EMULATE_DONE;
> >  }
> >  
> > +static int handle_apic_eoi_induced(struct kvm_vcpu *vcpu)
> > +{
> > +	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> > +	int vector = exit_qualification & 0xff;
> > +
> > +	/* EOI-induced VM exit is trap-like and thus no need to adjust IP */
> > +	kvm_apic_set_eoi_accelerated(vcpu, vector);
> > +	return 1;
> > +}
> > +
> >  static int handle_apic_write(struct kvm_vcpu *vcpu)
> >  {
> >  	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> > @@ -5750,6 +5788,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
> >  	[EXIT_REASON_TPR_BELOW_THRESHOLD]     = handle_tpr_below_threshold,
> >  	[EXIT_REASON_APIC_ACCESS]             = handle_apic_access,
> >  	[EXIT_REASON_APIC_WRITE]              = handle_apic_write,
> > +	[EXIT_REASON_EOI_INDUCED]             = handle_apic_eoi_induced,
> >  	[EXIT_REASON_WBINVD]                  = handle_wbinvd,
> >  	[EXIT_REASON_XSETBV]                  = handle_xsetbv,
> >  	[EXIT_REASON_TASK_SWITCH]             = handle_task_switch,
> > @@ -6099,6 +6138,142 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
> >  	vmcs_write32(TPR_THRESHOLD, irr);
> >  }
> >  
> > +static int vmx_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
> > +{
> > +	return enable_apicv_reg_vid;
> > +}
> > +
> > +static void vmx_restore_rvi(struct kvm_vcpu *vcpu)
> > +{
> > +	int isr;
> > +	u16 status;
> > +	u8 old;
> > +
> > +	if (!enable_apicv_reg_vid)
> > +		return;
> > +
> > +	isr = kvm_apic_find_highest_isr(vcpu->arch.apic);
> > +	if (isr == -1)
> > +		return;
> > +
> > +	status = vmcs_read16(GUEST_INTR_STATUS);
> > +	old = status >> 8;
> > +	if (isr != old) {
> > +		status &= 0xff;
> > +		status |= isr << 8;
> > +		vmcs_write16(GUEST_INTR_STATUS, status);
> > +	}
> > +}
> > +
> > +static void vmx_update_rvi(int vector)
> > +{
> > +	u16 status;
> > +	u8 old;
> > +
> > +	status = vmcs_read16(GUEST_INTR_STATUS);
> > +	old = (u8)status & 0xff;
> > +	if ((u8)vector != old) {
> > +		status &= ~0xff;
> > +		status |= (u8)vector;
> > +		vmcs_write16(GUEST_INTR_STATUS, status);
> > +	}
> > +}
> > +
> > +static void vmx_update_apic_irq(struct kvm_vcpu *vcpu, int max_irr)
> > +{
> > +	if (max_irr == -1)
> > +		return;
> > +
> > +	vmx_update_rvi(max_irr);
> > +}
> > +
> > +static void set_eoi_exitmap_one(struct kvm_vcpu *vcpu,
> > +				u32 vector)
> > +{
> > +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > +
> > +	if (!enable_apicv_reg_vid)
> > +		return;
> > +
> > +	if (WARN_ONCE((vector > 255),
> > +		"KVM VMX: vector (%d) out of range\n", vector))
> > +		return;
> > +
> > +	set_bit(vector, vmx->eoi_exit_bitmap);
> > +
> > +	kvm_make_request(KVM_REQ_EOIBITMAP, vcpu);
> > +}
> > +
> > +void vmx_update_eoi_exitmap(struct kvm *kvm, struct kvm_lapic_irq *irq)
> > +{
> > +	struct kvm_vcpu *vcpu;
> > +	struct kvm_lapic **dst;
> > +	struct kvm_apic_map *map;
> > +	unsigned long bitmap = 1;
> > +	int i;
> > +
> > +	rcu_read_lock();
> > +	map = rcu_dereference(kvm->arch.apic_map);
> > +
> > +	if (unlikely(!map)) {
> > +		kvm_for_each_vcpu(i, vcpu, kvm)
> > +			set_eoi_exitmap_one(vcpu, irq->vector);
> > +		goto out;
> > +	}
> 
> The suggestion was
> 
> 
> ioapic_write (or any other ioapic update)
> lock()
> perform update
> make_all_vcpus_request(KVM_REQ_UPDATE_EOI_BITMAP) (*)
> unlock()
> 
> (*) Similarly to TLB flush.
> 
> The advantage is that all work becomes vcpu local. The end result
> is much simpler code.
What complexity will it remove?

--
			Gleb.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
  2013-01-07 17:48     ` Gleb Natapov
@ 2013-01-07 21:32       ` Marcelo Tosatti
  2013-01-08  0:43         ` Zhang, Yang Z
  2013-01-08 10:03         ` Gleb Natapov
  0 siblings, 2 replies; 22+ messages in thread
From: Marcelo Tosatti @ 2013-01-07 21:32 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Yang Zhang, kvm, haitao.shan, Kevin Tian

On Mon, Jan 07, 2013 at 07:48:43PM +0200, Gleb Natapov wrote:
> > ioapic_write (or any other ioapic update)
> > lock()
> > perform update
> > make_all_vcpus_request(KVM_REQ_UPDATE_EOI_BITMAP) (*)
> > unlock()
> > 
> > (*) Similarly to TLB flush.
> > 
> > The advantage is that all work becomes vcpu local. The end result
> > is much simpler code.
> What complexity will it remove?

Synchronization between multiple CPUs (except the KVM_REQ_ bit
processing, which is infrastructure shared by other parts of KVM).

We agreed that performance is non issue here.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
  2013-01-07 21:32       ` Marcelo Tosatti
@ 2013-01-08  0:43         ` Zhang, Yang Z
  2013-01-08 13:40           ` Marcelo Tosatti
  2013-01-08 10:03         ` Gleb Natapov
  1 sibling, 1 reply; 22+ messages in thread
From: Zhang, Yang Z @ 2013-01-08  0:43 UTC (permalink / raw)
  To: Marcelo Tosatti, Gleb Natapov
  Cc: kvm@vger.kernel.org, Shan, Haitao, Tian, Kevin

Marcelo Tosatti wrote on 2013-01-08:
> On Mon, Jan 07, 2013 at 07:48:43PM +0200, Gleb Natapov wrote:
>>> ioapic_write (or any other ioapic update)
>>> lock()
>>> perform update
>>> make_all_vcpus_request(KVM_REQ_UPDATE_EOI_BITMAP) (*)
>>> unlock()
>>> 
>>> (*) Similarly to TLB flush.
>>> 
>>> The advantage is that all work becomes vcpu local. The end result
>>> is much simpler code.
>> What complexity will it remove?
> 
> Synchronization between multiple CPUs (except the KVM_REQ_ bit
> processing, which is infrastructure shared by other parts of KVM).
> 
> We agreed that performance is non issue here.
The current logic is this:
ioapic_write
lock()
perform update
make request on each vcpu
kick each vcpu
unlock()

The only difference is the way to make the request. And the complex part is performing update. With your suggestion, we still need to do the update. Why you think it is much simpler?

Best regards,
Yang


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
  2013-01-08  0:43         ` Zhang, Yang Z
@ 2013-01-08 13:40           ` Marcelo Tosatti
  0 siblings, 0 replies; 22+ messages in thread
From: Marcelo Tosatti @ 2013-01-08 13:40 UTC (permalink / raw)
  To: Zhang, Yang Z
  Cc: Gleb Natapov, kvm@vger.kernel.org, Shan, Haitao, Tian, Kevin

On Tue, Jan 08, 2013 at 12:43:22AM +0000, Zhang, Yang Z wrote:
> Marcelo Tosatti wrote on 2013-01-08:
> > On Mon, Jan 07, 2013 at 07:48:43PM +0200, Gleb Natapov wrote:
> >>> ioapic_write (or any other ioapic update)
> >>> lock()
> >>> perform update
> >>> make_all_vcpus_request(KVM_REQ_UPDATE_EOI_BITMAP) (*)
> >>> unlock()
> >>> 
> >>> (*) Similarly to TLB flush.
> >>> 
> >>> The advantage is that all work becomes vcpu local. The end result
> >>> is much simpler code.
> >> What complexity will it remove?
> > 
> > Synchronization between multiple CPUs (except the KVM_REQ_ bit
> > processing, which is infrastructure shared by other parts of KVM).
> > 
> > We agreed that performance is non issue here.
> The current logic is this:
> ioapic_write
> lock()
> perform update
> make request on each vcpu
> kick each vcpu
> unlock()
> 
> The only difference is the way to make the request. And the complex part is performing update. With your suggestion, we still need to do the update. Why you think it is much simpler?

The update should be local, because there is no reason to make it
remote.

       kvm_for_each_vcpu(index, vcpu, kvm)
               kvm_x86_ops->update_exitmap_start(vcpu);

       for (index = 0; index < IOAPIC_NUM_PINS; index++) {
               e = &ioapic->redirtbl[index];
               if (!e->fields.mask)
                       ioapic_update_eoi_exitmap_one(ioapic, index);
       }

       kvm_for_each_vcpu(index, vcpu, kvm) {
               kvm_x86_ops->update_exitmap_end(vcpu);
               kvm_vcpu_kick(vcpu);
       }

No need for start, end, etc calls into vcpus. All ioapic updater does
is set a request bit, similar to a remote TLB flush. Update is then
entirely done vcpu local.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
  2013-01-07 21:32       ` Marcelo Tosatti
  2013-01-08  0:43         ` Zhang, Yang Z
@ 2013-01-08 10:03         ` Gleb Natapov
  2013-01-08 12:57           ` Zhang, Yang Z
  2013-01-08 13:53           ` Marcelo Tosatti
  1 sibling, 2 replies; 22+ messages in thread
From: Gleb Natapov @ 2013-01-08 10:03 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Yang Zhang, kvm, haitao.shan, Kevin Tian

On Mon, Jan 07, 2013 at 07:32:39PM -0200, Marcelo Tosatti wrote:
> On Mon, Jan 07, 2013 at 07:48:43PM +0200, Gleb Natapov wrote:
> > > ioapic_write (or any other ioapic update)
> > > lock()
> > > perform update
> > > make_all_vcpus_request(KVM_REQ_UPDATE_EOI_BITMAP) (*)
> > > unlock()
> > > 
> > > (*) Similarly to TLB flush.
> > > 
> > > The advantage is that all work becomes vcpu local. The end result
> > > is much simpler code.
> > What complexity will it remove?
> 
> Synchronization between multiple CPUs (except the KVM_REQ_ bit
> processing, which is infrastructure shared by other parts of KVM).
> 
Synchronization is just a lock around bitmap access. Can be replaced
with RCU if it turns to be performance problem.

> We agreed that performance is non issue here.
Yes, if the code is indeed simpler we can take the hit, although
recalculating bitmap 255 times instead of one for -smp 255 looks like a
little bit excessive, but I do not see considerable simplification (if
at all).

So as far as I understand you are proposing:

vcpu0 or io thread:                   |    vcpu1:
ioapic_write (or other ioapic update) |
 lock(exitbitmap)                     |
 if (on vcpu)                         |
   ioapic_update_my_eoi_exitmap()     |
 make_all_vcpus_request(update)       |    if (update requested)
                                      |          ioapic_update_my_eoi_exitmap()
 unlock(exitbitmap)                   |

The current patch logic is this:

vcpu0 or io thread:                   |      vcpu1:
ioapic_write (or other ioapic update) |
 lock(exitbitmap)                     |
 ioapic_update_all_eoi_exitmaps()     |
 make request on each vcpu            | 
 kick each vcpu                       |     if (update requested)
 unlock(exitbitmap)                   |        lock(exitbitmap)
                                      |        load_exitbitmap()
                                      |        unlock(exitbitmap)

If I described correctly what you are proposing I do not
see simplification since the bulk of the complexity is in the
ioapic_update_(my|all)_eoi_exitmap() and they will be the same in both
implementations. Actually I do see complication in your idea introduced
by the fact that the case when update is done from vcpu thread have to
be handled specially.

The proposed patch may be simplified further by make_all_vcpus_request_async(update)(*)
instead of making request and kicking each vcpu individually. In fact
the way it is done now is buggy since requests are made only for vcpus
with bit set in their bitmask, but if bit is cleared request is not made
so vcpu can run with stale bitmask.

(*) not exists yet as far as I see.

--
			Gleb.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
  2013-01-08 10:03         ` Gleb Natapov
@ 2013-01-08 12:57           ` Zhang, Yang Z
  2013-01-08 13:57             ` Marcelo Tosatti
  2013-01-08 13:53           ` Marcelo Tosatti
  1 sibling, 1 reply; 22+ messages in thread
From: Zhang, Yang Z @ 2013-01-08 12:57 UTC (permalink / raw)
  To: Gleb Natapov, Marcelo Tosatti
  Cc: kvm@vger.kernel.org, Shan, Haitao, Tian, Kevin

Gleb Natapov wrote on 2013-01-08:
> On Mon, Jan 07, 2013 at 07:32:39PM -0200, Marcelo Tosatti wrote:
>> On Mon, Jan 07, 2013 at 07:48:43PM +0200, Gleb Natapov wrote:
>>>> ioapic_write (or any other ioapic update)
>>>> lock()
>>>> perform update
>>>> make_all_vcpus_request(KVM_REQ_UPDATE_EOI_BITMAP) (*)
>>>> unlock()
>>>> 
>>>> (*) Similarly to TLB flush.
>>>> 
>>>> The advantage is that all work becomes vcpu local. The end result
>>>> is much simpler code.
>>> What complexity will it remove?
>> 
>> Synchronization between multiple CPUs (except the KVM_REQ_ bit
>> processing, which is infrastructure shared by other parts of KVM).
>> 
> Synchronization is just a lock around bitmap access. Can be replaced
> with RCU if it turns to be performance problem.
> 
>> We agreed that performance is non issue here.
> Yes, if the code is indeed simpler we can take the hit, although
> recalculating bitmap 255 times instead of one for -smp 255 looks like a
> little bit excessive, but I do not see considerable simplification (if
> at all).
> 
> So as far as I understand you are proposing:
> 
> vcpu0 or io thread:                   |    vcpu1:
> ioapic_write (or other ioapic update) |
>  lock(exitbitmap)                     |
>  if (on vcpu)                         |
>    ioapic_update_my_eoi_exitmap()     |
>  make_all_vcpus_request(update)       |    if (update requested)
>                                       |
> ioapic_update_my_eoi_exitmap()
>  unlock(exitbitmap)                   |
> The current patch logic is this:
> 
> vcpu0 or io thread:                   |      vcpu1:
> ioapic_write (or other ioapic update) |
>  lock(exitbitmap)                     |
>  ioapic_update_all_eoi_exitmaps()     |
>  make request on each vcpu            |
>  kick each vcpu                       |     if (update requested)
>  unlock(exitbitmap)                   |        lock(exitbitmap)
>                                       |        load_exitbitmap()
>                                       |        unlock(exitbitmap)
> If I described correctly what you are proposing I do not
> see simplification since the bulk of the complexity is in the
> ioapic_update_(my|all)_eoi_exitmap() and they will be the same in both
> implementations. Actually I do see complication in your idea introduced
> by the fact that the case when update is done from vcpu thread have to
> be handled specially.
> 
> The proposed patch may be simplified further by
> make_all_vcpus_request_async(update)(*) instead of making request and
> kicking each vcpu individually. In fact the way it is done now is buggy
> since requests are made only for vcpus with bit set in their bitmask,
> but if bit is cleared request is not made so vcpu can run with stale
> bitmask.
ok, how about the follow logic:
ioapic_write()
lock()
clear_eoi_exitmap_on_all_vcpus()
perform update(no make request)
make_all_vcpus_request(like tlb flush)
unlock()

Best regards,
Yang


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
  2013-01-08 12:57           ` Zhang, Yang Z
@ 2013-01-08 13:57             ` Marcelo Tosatti
  2013-01-08 15:43               ` Gleb Natapov
  0 siblings, 1 reply; 22+ messages in thread
From: Marcelo Tosatti @ 2013-01-08 13:57 UTC (permalink / raw)
  To: Zhang, Yang Z
  Cc: Gleb Natapov, kvm@vger.kernel.org, Shan, Haitao, Tian, Kevin

On Tue, Jan 08, 2013 at 12:57:42PM +0000, Zhang, Yang Z wrote:
> Gleb Natapov wrote on 2013-01-08:
> > On Mon, Jan 07, 2013 at 07:32:39PM -0200, Marcelo Tosatti wrote:
> >> On Mon, Jan 07, 2013 at 07:48:43PM +0200, Gleb Natapov wrote:
> >>>> ioapic_write (or any other ioapic update)
> >>>> lock()
> >>>> perform update
> >>>> make_all_vcpus_request(KVM_REQ_UPDATE_EOI_BITMAP) (*)
> >>>> unlock()
> >>>> 
> >>>> (*) Similarly to TLB flush.
> >>>> 
> >>>> The advantage is that all work becomes vcpu local. The end result
> >>>> is much simpler code.
> >>> What complexity will it remove?
> >> 
> >> Synchronization between multiple CPUs (except the KVM_REQ_ bit
> >> processing, which is infrastructure shared by other parts of KVM).
> >> 
> > Synchronization is just a lock around bitmap access. Can be replaced
> > with RCU if it turns to be performance problem.
> > 
> >> We agreed that performance is non issue here.
> > Yes, if the code is indeed simpler we can take the hit, although
> > recalculating bitmap 255 times instead of one for -smp 255 looks like a
> > little bit excessive, but I do not see considerable simplification (if
> > at all).
> > 
> > So as far as I understand you are proposing:
> > 
> > vcpu0 or io thread:                   |    vcpu1:
> > ioapic_write (or other ioapic update) |
> >  lock(exitbitmap)                     |
> >  if (on vcpu)                         |
> >    ioapic_update_my_eoi_exitmap()     |
> >  make_all_vcpus_request(update)       |    if (update requested)
> >                                       |
> > ioapic_update_my_eoi_exitmap()
> >  unlock(exitbitmap)                   |
> > The current patch logic is this:
> > 
> > vcpu0 or io thread:                   |      vcpu1:
> > ioapic_write (or other ioapic update) |
> >  lock(exitbitmap)                     |
> >  ioapic_update_all_eoi_exitmaps()     |
> >  make request on each vcpu            |
> >  kick each vcpu                       |     if (update requested)
> >  unlock(exitbitmap)                   |        lock(exitbitmap)
> >                                       |        load_exitbitmap()
> >                                       |        unlock(exitbitmap)
> > If I described correctly what you are proposing I do not
> > see simplification since the bulk of the complexity is in the
> > ioapic_update_(my|all)_eoi_exitmap() and they will be the same in both
> > implementations. Actually I do see complication in your idea introduced
> > by the fact that the case when update is done from vcpu thread have to
> > be handled specially.
> > 
> > The proposed patch may be simplified further by
> > make_all_vcpus_request_async(update)(*) instead of making request and
> > kicking each vcpu individually. In fact the way it is done now is buggy
> > since requests are made only for vcpus with bit set in their bitmask,
> > but if bit is cleared request is not made so vcpu can run with stale
> > bitmask.
> ok, how about the follow logic:
> ioapic_write()
> lock()
> clear_eoi_exitmap_on_all_vcpus()
> perform update(no make request)
> make_all_vcpus_request(like tlb flush)
> unlock()

Why not just 

ioapic writer / map updater context
----------------------------------

ioapic_write()
make_all_vcpus_request()


(no special lock taken)


vcpu context, entry
------------------

	if(check_request(KVM_REQ_, ....)) {
		ioapic_lock();		(*)
		update local EOI exit bitmap from IOAPIC
		ioapic_unlock();
	}



(*) plus any other lock that paths that update the map take





> 
> Best regards,
> Yang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
  2013-01-08 13:57             ` Marcelo Tosatti
@ 2013-01-08 15:43               ` Gleb Natapov
  2013-01-09  8:07                 ` Zhang, Yang Z
  0 siblings, 1 reply; 22+ messages in thread
From: Gleb Natapov @ 2013-01-08 15:43 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Zhang, Yang Z, kvm@vger.kernel.org, Shan, Haitao, Tian, Kevin

On Tue, Jan 08, 2013 at 11:57:59AM -0200, Marcelo Tosatti wrote:
> On Tue, Jan 08, 2013 at 12:57:42PM +0000, Zhang, Yang Z wrote:
> > Gleb Natapov wrote on 2013-01-08:
> > > On Mon, Jan 07, 2013 at 07:32:39PM -0200, Marcelo Tosatti wrote:
> > >> On Mon, Jan 07, 2013 at 07:48:43PM +0200, Gleb Natapov wrote:
> > >>>> ioapic_write (or any other ioapic update)
> > >>>> lock()
> > >>>> perform update
> > >>>> make_all_vcpus_request(KVM_REQ_UPDATE_EOI_BITMAP) (*)
> > >>>> unlock()
> > >>>> 
> > >>>> (*) Similarly to TLB flush.
> > >>>> 
> > >>>> The advantage is that all work becomes vcpu local. The end result
> > >>>> is much simpler code.
> > >>> What complexity will it remove?
> > >> 
> > >> Synchronization between multiple CPUs (except the KVM_REQ_ bit
> > >> processing, which is infrastructure shared by other parts of KVM).
> > >> 
> > > Synchronization is just a lock around bitmap access. Can be replaced
> > > with RCU if it turns to be performance problem.
> > > 
> > >> We agreed that performance is non issue here.
> > > Yes, if the code is indeed simpler we can take the hit, although
> > > recalculating bitmap 255 times instead of one for -smp 255 looks like a
> > > little bit excessive, but I do not see considerable simplification (if
> > > at all).
> > > 
> > > So as far as I understand you are proposing:
> > > 
> > > vcpu0 or io thread:                   |    vcpu1:
> > > ioapic_write (or other ioapic update) |
> > >  lock(exitbitmap)                     |
> > >  if (on vcpu)                         |
> > >    ioapic_update_my_eoi_exitmap()     |
> > >  make_all_vcpus_request(update)       |    if (update requested)
> > >                                       |
> > > ioapic_update_my_eoi_exitmap()
> > >  unlock(exitbitmap)                   |
> > > The current patch logic is this:
> > > 
> > > vcpu0 or io thread:                   |      vcpu1:
> > > ioapic_write (or other ioapic update) |
> > >  lock(exitbitmap)                     |
> > >  ioapic_update_all_eoi_exitmaps()     |
> > >  make request on each vcpu            |
> > >  kick each vcpu                       |     if (update requested)
> > >  unlock(exitbitmap)                   |        lock(exitbitmap)
> > >                                       |        load_exitbitmap()
> > >                                       |        unlock(exitbitmap)
> > > If I described correctly what you are proposing I do not
> > > see simplification since the bulk of the complexity is in the
> > > ioapic_update_(my|all)_eoi_exitmap() and they will be the same in both
> > > implementations. Actually I do see complication in your idea introduced
> > > by the fact that the case when update is done from vcpu thread have to
> > > be handled specially.
> > > 
> > > The proposed patch may be simplified further by
> > > make_all_vcpus_request_async(update)(*) instead of making request and
> > > kicking each vcpu individually. In fact the way it is done now is buggy
> > > since requests are made only for vcpus with bit set in their bitmask,
> > > but if bit is cleared request is not made so vcpu can run with stale
> > > bitmask.
> > ok, how about the follow logic:
> > ioapic_write()
> > lock()
> > clear_eoi_exitmap_on_all_vcpus()
> > perform update(no make request)
> > make_all_vcpus_request(like tlb flush)
> > unlock()
> 
> Why not just 
> 
> ioapic writer / map updater context
> ----------------------------------
> 
> ioapic_write()
> make_all_vcpus_request()
> 
> 
> (no special lock taken)
> 
> 
> vcpu context, entry
> ------------------
> 
> 	if(check_request(KVM_REQ_, ....)) {
> 		ioapic_lock();		(*)
> 		update local EOI exit bitmap from IOAPIC
> 		ioapic_unlock();
> 	}
> 
Fine by me. Looks simpler.

> 
> 
> (*) plus any other lock that paths that update the map take
> 
> 
> 
> 
> 
> > 
> > Best regards,
> > Yang

--
			Gleb.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
  2013-01-08 15:43               ` Gleb Natapov
@ 2013-01-09  8:07                 ` Zhang, Yang Z
  2013-01-09 15:10                   ` Marcelo Tosatti
  0 siblings, 1 reply; 22+ messages in thread
From: Zhang, Yang Z @ 2013-01-09  8:07 UTC (permalink / raw)
  To: Gleb Natapov, Marcelo Tosatti
  Cc: kvm@vger.kernel.org, Shan, Haitao, Tian, Kevin

Gleb Natapov wrote on 2013-01-08:
> On Tue, Jan 08, 2013 at 11:57:59AM -0200, Marcelo Tosatti wrote:
>> On Tue, Jan 08, 2013 at 12:57:42PM +0000, Zhang, Yang Z wrote:
>>> Gleb Natapov wrote on 2013-01-08:
>>>> On Mon, Jan 07, 2013 at 07:32:39PM -0200, Marcelo Tosatti wrote:
>>>>> On Mon, Jan 07, 2013 at 07:48:43PM +0200, Gleb Natapov wrote:
>>>>>>> ioapic_write (or any other ioapic update)
>>>>>>> lock()
>>>>>>> perform update
>>>>>>> make_all_vcpus_request(KVM_REQ_UPDATE_EOI_BITMAP) (*)
>>>>>>> unlock()
>>>>>>> 
>>>>>>> (*) Similarly to TLB flush.
>>>>>>> 
>>>>>>> The advantage is that all work becomes vcpu local. The end result
>>>>>>> is much simpler code.
>>>>>> What complexity will it remove?
>>>>> 
>>>>> Synchronization between multiple CPUs (except the KVM_REQ_ bit
>>>>> processing, which is infrastructure shared by other parts of KVM).
>>>>> 
>>>> Synchronization is just a lock around bitmap access. Can be replaced
>>>> with RCU if it turns to be performance problem.
>>>> 
>>>>> We agreed that performance is non issue here.
>>>> Yes, if the code is indeed simpler we can take the hit, although
>>>> recalculating bitmap 255 times instead of one for -smp 255 looks like a
>>>> little bit excessive, but I do not see considerable simplification (if
>>>> at all).
>>>> 
>>>> So as far as I understand you are proposing:
>>>> 
>>>> vcpu0 or io thread:                   |    vcpu1:
>>>> ioapic_write (or other ioapic update) |
>>>>  lock(exitbitmap)                     |
>>>>  if (on vcpu)                         |
>>>>    ioapic_update_my_eoi_exitmap()     |
>>>>  make_all_vcpus_request(update)       |    if (update requested)
>>>>                                       |
>>>> ioapic_update_my_eoi_exitmap()
>>>>  unlock(exitbitmap)                   |
>>>> The current patch logic is this:
>>>> 
>>>> vcpu0 or io thread:                   |      vcpu1:
>>>> ioapic_write (or other ioapic update) |
>>>>  lock(exitbitmap)                     |
>>>>  ioapic_update_all_eoi_exitmaps()     |
>>>>  make request on each vcpu            |
>>>>  kick each vcpu                       |     if (update requested)
>>>>  unlock(exitbitmap)                   |        lock(exitbitmap)
>>>>                                       |        load_exitbitmap()
>>>>                                       |        unlock(exitbitmap)
>>>> If I described correctly what you are proposing I do not
>>>> see simplification since the bulk of the complexity is in the
>>>> ioapic_update_(my|all)_eoi_exitmap() and they will be the same in both
>>>> implementations. Actually I do see complication in your idea introduced
>>>> by the fact that the case when update is done from vcpu thread have to
>>>> be handled specially.
>>>> 
>>>> The proposed patch may be simplified further by
>>>> make_all_vcpus_request_async(update)(*) instead of making request and
>>>> kicking each vcpu individually. In fact the way it is done now is buggy
>>>> since requests are made only for vcpus with bit set in their bitmask,
>>>> but if bit is cleared request is not made so vcpu can run with stale
>>>> bitmask.
>>> ok, how about the follow logic:
>>> ioapic_write()
>>> lock()
>>> clear_eoi_exitmap_on_all_vcpus()
>>> perform update(no make request)
>>> make_all_vcpus_request(like tlb flush)
>>> unlock()
>> 
>> Why not just
>> 
>> ioapic writer / map updater context
>> ----------------------------------
>> 
>> ioapic_write()
>> make_all_vcpus_request()
>> 
>> 
>> (no special lock taken)
>> 
>> 
>> vcpu context, entry
>> ------------------
>> 
>> 	if(check_request(KVM_REQ_, ....)) {
>> 		ioapic_lock();		(*)
>> 		update local EOI exit bitmap from IOAPIC
In my patch, it traverses IOAPIC entry once and only updates target vcpus's eoi exit bitmap. Then make request for all vcpus.
With your suggestion , all vcpus will traverse all IOAPIC entries. Though ioapic entry write is rare, it's still not reasonable.

>> 		ioapic_unlock();
>> 	}
>> 
> Fine by me. Looks simpler.
> 
>> 
>> 
>> (*) plus any other lock that paths that update the map take
>> 
>> 
>> 
>> 
>> 
>>> 
>>> Best regards,
>>> Yang
> 
> --
> 			Gleb.


Best regards,
Yang


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
  2013-01-09  8:07                 ` Zhang, Yang Z
@ 2013-01-09 15:10                   ` Marcelo Tosatti
  2013-01-10  0:22                     ` Zhang, Yang Z
  0 siblings, 1 reply; 22+ messages in thread
From: Marcelo Tosatti @ 2013-01-09 15:10 UTC (permalink / raw)
  To: Zhang, Yang Z
  Cc: Gleb Natapov, kvm@vger.kernel.org, Shan, Haitao, Tian, Kevin

On Wed, Jan 09, 2013 at 08:07:31AM +0000, Zhang, Yang Z wrote:
> >> 	if(check_request(KVM_REQ_, ....)) {
> >> 		ioapic_lock();		(*)
> >> 		update local EOI exit bitmap from IOAPIC
> In my patch, it traverses IOAPIC entry once and only updates target vcpus's eoi exit bitmap. Then make request for all vcpus.
> With your suggestion , all vcpus will traverse all IOAPIC entries. Though ioapic entry write is rare, it's still not reasonable.

It should be fast, and very rare (as in once during system
initialization, or device hotplug). 

Is there a particular case that makes it necessary to optimize scanning?

> 
> >> 		ioapic_unlock();
> >> 	}
> >> 
> > Fine by me. Looks simpler.
> > 
> >> 
> >> 
> >> (*) plus any other lock that paths that update the map take
> >> 
> >> 
> >> 
> >> 
> >> 
> >>> 
> >>> Best regards,
> >>> Yang
> > 
> > --
> > 			Gleb.
> 
> 
> Best regards,
> Yang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
  2013-01-09 15:10                   ` Marcelo Tosatti
@ 2013-01-10  0:22                     ` Zhang, Yang Z
  0 siblings, 0 replies; 22+ messages in thread
From: Zhang, Yang Z @ 2013-01-10  0:22 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Gleb Natapov, kvm@vger.kernel.org, Shan, Haitao, Tian, Kevin

Marcelo Tosatti wrote on 2013-01-09:
> On Wed, Jan 09, 2013 at 08:07:31AM +0000, Zhang, Yang Z wrote:
>>>> 	if(check_request(KVM_REQ_, ....)) {
>>>> 		ioapic_lock();		(*)
>>>> 		update local EOI exit bitmap from IOAPIC
>> In my patch, it traverses IOAPIC entry once and only updates target
>> vcpus's eoi exit bitmap. Then make request for all vcpus. With your
>> suggestion , all vcpus will traverse all IOAPIC entries. Though ioapic
> entry write is rare, it's still not reasonable.
> 
> It should be fast, and very rare (as in once during system
> initialization, or device hotplug).
Ok. Will revise the patch follow your suggestion.

> 
> Is there a particular case that makes it necessary to optimize scanning?
No. 

>> 
>>>> 		ioapic_unlock();
>>>> 	}
>>>> 
>>> Fine by me. Looks simpler.
>>> 
>>>> 
>>>> 
>>>> (*) plus any other lock that paths that update the map take
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> Best regards,
>>>>> Yang
>>> 
>>> --
>>> 			Gleb.
>> 
>> 
>> Best regards,
>> Yang


Best regards,
Yang


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
  2013-01-08 10:03         ` Gleb Natapov
  2013-01-08 12:57           ` Zhang, Yang Z
@ 2013-01-08 13:53           ` Marcelo Tosatti
  1 sibling, 0 replies; 22+ messages in thread
From: Marcelo Tosatti @ 2013-01-08 13:53 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Yang Zhang, kvm, haitao.shan, Kevin Tian

On Tue, Jan 08, 2013 at 12:03:32PM +0200, Gleb Natapov wrote:
> On Mon, Jan 07, 2013 at 07:32:39PM -0200, Marcelo Tosatti wrote:
> > On Mon, Jan 07, 2013 at 07:48:43PM +0200, Gleb Natapov wrote:
> > > > ioapic_write (or any other ioapic update)
> > > > lock()
> > > > perform update
> > > > make_all_vcpus_request(KVM_REQ_UPDATE_EOI_BITMAP) (*)
> > > > unlock()
> > > > 
> > > > (*) Similarly to TLB flush.
> > > > 
> > > > The advantage is that all work becomes vcpu local. The end result
> > > > is much simpler code.
> > > What complexity will it remove?
> > 
> > Synchronization between multiple CPUs (except the KVM_REQ_ bit
> > processing, which is infrastructure shared by other parts of KVM).
> > 
> Synchronization is just a lock around bitmap access. Can be replaced
> with RCU if it turns to be performance problem.
> 
> > We agreed that performance is non issue here.
> Yes, if the code is indeed simpler we can take the hit, although
> recalculating bitmap 255 times instead of one for -smp 255 looks like a
> little bit excessive, but I do not see considerable simplification (if
> at all).
> 
> So as far as I understand you are proposing:
> 
> vcpu0 or io thread:                   |    vcpu1:
> ioapic_write (or other ioapic update) |
>  lock(exitbitmap)                     |
>  if (on vcpu)                         |
>    ioapic_update_my_eoi_exitmap()     |
>  make_all_vcpus_request(update)       |    if (update requested)
>                                       |          ioapic_update_my_eoi_exitmap()
>  unlock(exitbitmap)                   |
> 
> The current patch logic is this:
> 
> vcpu0 or io thread:                   |      vcpu1:
> ioapic_write (or other ioapic update) |
>  lock(exitbitmap)                     |
>  ioapic_update_all_eoi_exitmaps()     | 	<------- (1)

XXX

>  make request on each vcpu            | 
>  kick each vcpu                       |     if (update requested)
>  unlock(exitbitmap)                   |        lock(exitbitmap)
>                                       |        load_exitbitmap()
>                                       |        unlock(exitbitmap)
> 
> If I described correctly what you are proposing I do not
> see simplification since the bulk of the complexity is in the
> ioapic_update_(my|all)_eoi_exitmap() and they will be the same in both
> implementations. Actually I do see complication in your idea introduced
> by the fact that the case when update is done from vcpu thread have to
> be handled specially.

The simplification is you don't remotely update the EOI exit bitmaps. 
Its obvious that updating data locally (that is, the context that
updates data is the only user of that data) is simpler.

Yes, ioapic_update_exitmap is the same implementation, but there is no
concern over the state of the remote vcpu. 

For example, you don't have to worry whether code such as this

+static void vmx_update_exitmap_start(struct kvm_vcpu *vcpu)
+{
+       struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+       memset(vmx->eoi_exit_bitmap, 0, 32);
+       spin_lock(&vmx->eoi_bitmap_lock);
+}

is safe.

Because eoi_exit_bitmap is only accessed by the vcpu itself.

Does that make sense?

> The proposed patch may be simplified further by make_all_vcpus_request_async(update)(*)
> instead of making request and kicking each vcpu individually. In fact
> the way it is done now is buggy since requests are made only for vcpus
> with bit set in their bitmask, but if bit is cleared request is not made
> so vcpu can run with stale bitmask.
> 
> (*) not exists yet as far as I see.
> 
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v8 3/3] x86, apicv: add virtual x2apic support
  2013-01-07  2:02 [PATCH v8 0/3] x86, apicv: Add APIC virtualization support Yang Zhang
  2013-01-07  2:02 ` [PATCH v8 1/3] x86, apicv: add APICv register " Yang Zhang
  2013-01-07  2:02 ` [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support Yang Zhang
@ 2013-01-07  2:02 ` Yang Zhang
  2013-01-07  6:46   ` Gleb Natapov
  2 siblings, 1 reply; 22+ messages in thread
From: Yang Zhang @ 2013-01-07  2:02 UTC (permalink / raw)
  To: kvm; +Cc: gleb, haitao.shan, mtosatti, Yang Zhang, Kevin Tian

From: Yang Zhang <yang.z.zhang@Intel.com>

basically to benefit from apicv, we need clear MSR bitmap for
corresponding x2apic MSRs when guest enabled x2apic:
    0x800 - 0x8ff: no read intercept for apicv register virtualization
    TPR,EOI,SELF-IPI: no write intercept for virtual interrupt delivery

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/include/asm/vmx.h      |    1 +
 arch/x86/kvm/lapic.c            |    2 +
 arch/x86/kvm/svm.c              |    6 +++
 arch/x86/kvm/vmx.c              |   80 +++++++++++++++++++++++++++++++++++---
 5 files changed, 83 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 135603f..af9a8c3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -704,6 +704,7 @@ struct kvm_x86_ops {
 	void (*update_exitmap_end)(struct kvm_vcpu *vcpu);
 	void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu);
 	void (*restore_rvi)(struct kvm_vcpu *vcpu);
+	void (*enable_virtual_x2apic_mode)(struct kvm_vcpu *vcpu);
 	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
 	int (*get_tdp_level)(void);
 	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index d1ab331..694586c 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -140,6 +140,7 @@
 #define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 0x00000001
 #define SECONDARY_EXEC_ENABLE_EPT               0x00000002
 #define SECONDARY_EXEC_RDTSCP			0x00000008
+#define SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE   0x00000010
 #define SECONDARY_EXEC_ENABLE_VPID              0x00000020
 #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040
 #define SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e1baf37..dba5300 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1349,6 +1349,8 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
 		u32 id = kvm_apic_id(apic);
 		u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
 		kvm_apic_set_ldr(apic, ldr);
+		kvm_x86_ops->enable_virtual_x2apic_mode(vcpu);
+
 	}
 	apic->base_address = apic->vcpu->arch.apic_base &
 			     MSR_IA32_APICBASE_BASE;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index a8a8a4e..3e34e19 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3601,6 +3601,11 @@ static void svm_restore_rvi(struct kvm_vcpu *vcpu)
 	return;
 }
 
+static void svm_enable_virtual_x2apic_mode(struct kvm_vcpu *vcpu)
+{
+	return;
+}
+
 static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -4326,6 +4331,7 @@ static struct kvm_x86_ops svm_x86_ops = {
 	.update_exitmap_end = svm_update_exitmap_end,
 	.load_eoi_exitmap = svm_load_eoi_exitmap,
 	.restore_rvi = svm_restore_rvi,
+	.enable_virtual_x2apic_mode = svm_enable_virtual_x2apic_mode,
 
 	.set_tss_addr = svm_set_tss_addr,
 	.get_tdp_level = get_npt_level,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0c85c7e..466b05d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2551,6 +2551,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 	if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS) {
 		min2 = 0;
 		opt2 = SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
+			SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
 			SECONDARY_EXEC_WBINVD_EXITING |
 			SECONDARY_EXEC_ENABLE_VPID |
 			SECONDARY_EXEC_ENABLE_EPT |
@@ -3739,7 +3740,10 @@ static void free_vpid(struct vcpu_vmx *vmx)
 	spin_unlock(&vmx_vpid_lock);
 }
 
-static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, u32 msr)
+#define MSR_TYPE_R	1
+#define MSR_TYPE_W	2
+static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
+						u32 msr, int type)
 {
 	int f = sizeof(unsigned long);
 
@@ -3752,20 +3756,52 @@ static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, u32 msr)
 	 * We can control MSRs 0x00000000-0x00001fff and 0xc0000000-0xc0001fff.
 	 */
 	if (msr <= 0x1fff) {
-		__clear_bit(msr, msr_bitmap + 0x000 / f); /* read-low */
-		__clear_bit(msr, msr_bitmap + 0x800 / f); /* write-low */
+		if (type & MSR_TYPE_R)
+			/* read-low */
+			__clear_bit(msr, msr_bitmap + 0x000 / f);
+
+		if (type & MSR_TYPE_W)
+			/* write-low */
+			__clear_bit(msr, msr_bitmap + 0x800 / f);
+
 	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
 		msr &= 0x1fff;
-		__clear_bit(msr, msr_bitmap + 0x400 / f); /* read-high */
-		__clear_bit(msr, msr_bitmap + 0xc00 / f); /* write-high */
+		if (type & MSR_TYPE_R)
+			/* read-high */
+			__clear_bit(msr, msr_bitmap + 0x400 / f);
+
+		if (type & MSR_TYPE_W)
+			/* write-high */
+			__clear_bit(msr, msr_bitmap + 0xc00 / f);
+
 	}
 }
 
 static void vmx_disable_intercept_for_msr(u32 msr, bool longmode_only)
 {
 	if (!longmode_only)
-		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy, msr);
-	__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode, msr);
+		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy,
+						msr, MSR_TYPE_R | MSR_TYPE_W);
+	__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode,
+						msr, MSR_TYPE_R | MSR_TYPE_W);
+}
+
+static void vmx_disable_intercept_for_msr_read(u32 msr, bool longmode_only)
+{
+	if (!longmode_only)
+		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy,
+						msr, MSR_TYPE_R);
+	__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode,
+					msr, MSR_TYPE_R);
+}
+
+static void vmx_disable_intercept_for_msr_write(u32 msr, bool longmode_only)
+{
+	if (!longmode_only)
+		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy,
+						msr, MSR_TYPE_W);
+	__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode,
+					msr, MSR_TYPE_W);
 }
 
 /*
@@ -3864,6 +3900,7 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
 	if (!enable_apicv_reg_vid)
 		exec_control &= ~(SECONDARY_EXEC_APIC_REGISTER_VIRT |
 				  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
+	exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
 	return exec_control;
 }
 
@@ -6274,6 +6311,34 @@ static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu)
 	spin_unlock(&vmx->eoi_bitmap_lock);
 }
 
+static void vmx_enable_virtual_x2apic_mode(struct kvm_vcpu *vcpu)
+{
+	u32 exec_control;
+	int msr;
+
+	if (!enable_apicv_reg_vid)
+		return;
+
+	exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
+	if (!(exec_control & CPU_BASED_TPR_SHADOW))
+		return;
+	exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
+	exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
+	exec_control |= SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
+
+	vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
+
+	for (msr = 0x800; msr <= 0x8ff; msr++)
+		vmx_disable_intercept_for_msr_read(msr, false);
+
+	/* TPR */
+	vmx_disable_intercept_for_msr_write(0x808, false);
+	/* EOI */
+	vmx_disable_intercept_for_msr_write(0x80b, false);
+	/* SELF-IPI */
+	vmx_disable_intercept_for_msr_write(0x83f, false);
+}
+
 static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
 {
 	u32 exit_intr_info;
@@ -7544,6 +7609,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.update_exitmap_end = vmx_update_exitmap_end,
 	.load_eoi_exitmap = vmx_load_eoi_exitmap,
 	.restore_rvi = vmx_restore_rvi,
+	.enable_virtual_x2apic_mode = vmx_enable_virtual_x2apic_mode,
 
 	.set_tss_addr = vmx_set_tss_addr,
 	.get_tdp_level = get_ept_level,
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 3/3] x86, apicv: add virtual x2apic support
  2013-01-07  2:02 ` [PATCH v8 3/3] x86, apicv: add virtual x2apic support Yang Zhang
@ 2013-01-07  6:46   ` Gleb Natapov
  2013-01-07  6:58     ` Zhang, Yang Z
  0 siblings, 1 reply; 22+ messages in thread
From: Gleb Natapov @ 2013-01-07  6:46 UTC (permalink / raw)
  To: Yang Zhang; +Cc: kvm, haitao.shan, mtosatti, Kevin Tian

On Mon, Jan 07, 2013 at 10:02:37AM +0800, Yang Zhang wrote:
> From: Yang Zhang <yang.z.zhang@Intel.com>
> 
> basically to benefit from apicv, we need clear MSR bitmap for
> corresponding x2apic MSRs when guest enabled x2apic:
>     0x800 - 0x8ff: no read intercept for apicv register virtualization
>     TPR,EOI,SELF-IPI: no write intercept for virtual interrupt delivery
> 
> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |    1 +
>  arch/x86/include/asm/vmx.h      |    1 +
>  arch/x86/kvm/lapic.c            |    2 +
>  arch/x86/kvm/svm.c              |    6 +++
>  arch/x86/kvm/vmx.c              |   80 +++++++++++++++++++++++++++++++++++---
>  5 files changed, 83 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 135603f..af9a8c3 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -704,6 +704,7 @@ struct kvm_x86_ops {
>  	void (*update_exitmap_end)(struct kvm_vcpu *vcpu);
>  	void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu);
>  	void (*restore_rvi)(struct kvm_vcpu *vcpu);
> +	void (*enable_virtual_x2apic_mode)(struct kvm_vcpu *vcpu);
>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>  	int (*get_tdp_level)(void);
>  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index d1ab331..694586c 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -140,6 +140,7 @@
>  #define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 0x00000001
>  #define SECONDARY_EXEC_ENABLE_EPT               0x00000002
>  #define SECONDARY_EXEC_RDTSCP			0x00000008
> +#define SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE   0x00000010
>  #define SECONDARY_EXEC_ENABLE_VPID              0x00000020
>  #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040
>  #define SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index e1baf37..dba5300 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1349,6 +1349,8 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
>  		u32 id = kvm_apic_id(apic);
>  		u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
>  		kvm_apic_set_ldr(apic, ldr);
> +		kvm_x86_ops->enable_virtual_x2apic_mode(vcpu);
And where do you disable it?

> +
>  	}
>  	apic->base_address = apic->vcpu->arch.apic_base &
>  			     MSR_IA32_APICBASE_BASE;
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index a8a8a4e..3e34e19 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -3601,6 +3601,11 @@ static void svm_restore_rvi(struct kvm_vcpu *vcpu)
>  	return;
>  }
>  
> +static void svm_enable_virtual_x2apic_mode(struct kvm_vcpu *vcpu)
> +{
> +	return;
> +}
> +
>  static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -4326,6 +4331,7 @@ static struct kvm_x86_ops svm_x86_ops = {
>  	.update_exitmap_end = svm_update_exitmap_end,
>  	.load_eoi_exitmap = svm_load_eoi_exitmap,
>  	.restore_rvi = svm_restore_rvi,
> +	.enable_virtual_x2apic_mode = svm_enable_virtual_x2apic_mode,
>  
>  	.set_tss_addr = svm_set_tss_addr,
>  	.get_tdp_level = get_npt_level,
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 0c85c7e..466b05d 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2551,6 +2551,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>  	if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS) {
>  		min2 = 0;
>  		opt2 = SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
> +			SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
>  			SECONDARY_EXEC_WBINVD_EXITING |
>  			SECONDARY_EXEC_ENABLE_VPID |
>  			SECONDARY_EXEC_ENABLE_EPT |
> @@ -3739,7 +3740,10 @@ static void free_vpid(struct vcpu_vmx *vmx)
>  	spin_unlock(&vmx_vpid_lock);
>  }
>  
> -static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, u32 msr)
> +#define MSR_TYPE_R	1
> +#define MSR_TYPE_W	2
> +static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
> +						u32 msr, int type)
>  {
>  	int f = sizeof(unsigned long);
>  
> @@ -3752,20 +3756,52 @@ static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, u32 msr)
>  	 * We can control MSRs 0x00000000-0x00001fff and 0xc0000000-0xc0001fff.
>  	 */
>  	if (msr <= 0x1fff) {
> -		__clear_bit(msr, msr_bitmap + 0x000 / f); /* read-low */
> -		__clear_bit(msr, msr_bitmap + 0x800 / f); /* write-low */
> +		if (type & MSR_TYPE_R)
> +			/* read-low */
> +			__clear_bit(msr, msr_bitmap + 0x000 / f);
> +
> +		if (type & MSR_TYPE_W)
> +			/* write-low */
> +			__clear_bit(msr, msr_bitmap + 0x800 / f);
> +
>  	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
>  		msr &= 0x1fff;
> -		__clear_bit(msr, msr_bitmap + 0x400 / f); /* read-high */
> -		__clear_bit(msr, msr_bitmap + 0xc00 / f); /* write-high */
> +		if (type & MSR_TYPE_R)
> +			/* read-high */
> +			__clear_bit(msr, msr_bitmap + 0x400 / f);
> +
> +		if (type & MSR_TYPE_W)
> +			/* write-high */
> +			__clear_bit(msr, msr_bitmap + 0xc00 / f);
> +
>  	}
>  }
>  
>  static void vmx_disable_intercept_for_msr(u32 msr, bool longmode_only)
>  {
>  	if (!longmode_only)
> -		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy, msr);
> -	__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode, msr);
> +		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy,
> +						msr, MSR_TYPE_R | MSR_TYPE_W);
> +	__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode,
> +						msr, MSR_TYPE_R | MSR_TYPE_W);
> +}
> +
> +static void vmx_disable_intercept_for_msr_read(u32 msr, bool longmode_only)
> +{
> +	if (!longmode_only)
> +		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy,
> +						msr, MSR_TYPE_R);
> +	__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode,
> +					msr, MSR_TYPE_R);
> +}
> +
> +static void vmx_disable_intercept_for_msr_write(u32 msr, bool longmode_only)
> +{
> +	if (!longmode_only)
> +		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy,
> +						msr, MSR_TYPE_W);
> +	__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode,
> +					msr, MSR_TYPE_W);
>  }
>  
>  /*
> @@ -3864,6 +3900,7 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
>  	if (!enable_apicv_reg_vid)
>  		exec_control &= ~(SECONDARY_EXEC_APIC_REGISTER_VIRT |
>  				  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
> +	exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
>  	return exec_control;
>  }
>  
> @@ -6274,6 +6311,34 @@ static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu)
>  	spin_unlock(&vmx->eoi_bitmap_lock);
>  }
>  
> +static void vmx_enable_virtual_x2apic_mode(struct kvm_vcpu *vcpu)
> +{
> +	u32 exec_control;
> +	int msr;
> +
> +	if (!enable_apicv_reg_vid)
> +		return;
> +
> +	exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
> +	if (!(exec_control & CPU_BASED_TPR_SHADOW))
> +		return;
> +	exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
> +	exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
> +	exec_control |= SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
> +
> +	vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
> +
> +	for (msr = 0x800; msr <= 0x8ff; msr++)
> +		vmx_disable_intercept_for_msr_read(msr, false);
> +
> +	/* TPR */
> +	vmx_disable_intercept_for_msr_write(0x808, false);
> +	/* EOI */
> +	vmx_disable_intercept_for_msr_write(0x80b, false);
> +	/* SELF-IPI */
> +	vmx_disable_intercept_for_msr_write(0x83f, false);
> +}
> +
>  static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
>  {
>  	u32 exit_intr_info;
> @@ -7544,6 +7609,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
>  	.update_exitmap_end = vmx_update_exitmap_end,
>  	.load_eoi_exitmap = vmx_load_eoi_exitmap,
>  	.restore_rvi = vmx_restore_rvi,
> +	.enable_virtual_x2apic_mode = vmx_enable_virtual_x2apic_mode,
>  
>  	.set_tss_addr = vmx_set_tss_addr,
>  	.get_tdp_level = get_ept_level,
> -- 
> 1.7.1

--
			Gleb.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v8 3/3] x86, apicv: add virtual x2apic support
  2013-01-07  6:46   ` Gleb Natapov
@ 2013-01-07  6:58     ` Zhang, Yang Z
  2013-01-07  7:07       ` Gleb Natapov
  0 siblings, 1 reply; 22+ messages in thread
From: Zhang, Yang Z @ 2013-01-07  6:58 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm@vger.kernel.org, Shan, Haitao, mtosatti@redhat.com,
	Tian, Kevin

Gleb Natapov wrote on 2013-01-07:
> On Mon, Jan 07, 2013 at 10:02:37AM +0800, Yang Zhang wrote:
>> From: Yang Zhang <yang.z.zhang@Intel.com>
>> 
>> basically to benefit from apicv, we need clear MSR bitmap for
>> corresponding x2apic MSRs when guest enabled x2apic:
>>     0x800 - 0x8ff: no read intercept for apicv register virtualization
>>     TPR,EOI,SELF-IPI: no write intercept for virtual interrupt delivery
>> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
>> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h |    1 + arch/x86/include/asm/vmx.h   
>>    |    1 + arch/x86/kvm/lapic.c            |    2 + arch/x86/kvm/svm.c
>>               |    6 +++ arch/x86/kvm/vmx.c              |   80
>>  +++++++++++++++++++++++++++++++++++--- 5 files changed, 83
>>  insertions(+), 7 deletions(-)
>> diff --git a/arch/x86/include/asm/kvm_host.h
>> b/arch/x86/include/asm/kvm_host.h index 135603f..af9a8c3 100644 ---
>> a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -704,6 +704,7 @@ struct kvm_x86_ops {
>>  	void (*update_exitmap_end)(struct kvm_vcpu *vcpu); 	void
>>  (*load_eoi_exitmap)(struct kvm_vcpu *vcpu); 	void
>>  (*restore_rvi)(struct kvm_vcpu *vcpu); +	void
>>  (*enable_virtual_x2apic_mode)(struct kvm_vcpu *vcpu); 	int
>>  (*set_tss_addr)(struct kvm *kvm, unsigned int addr); 	int
>>  (*get_tdp_level)(void); 	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu,
>>  gfn_t gfn, bool is_mmio);
>> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
>> index d1ab331..694586c 100644
>> --- a/arch/x86/include/asm/vmx.h
>> +++ b/arch/x86/include/asm/vmx.h
>> @@ -140,6 +140,7 @@
>>  #define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 0x00000001 #define
>>  SECONDARY_EXEC_ENABLE_EPT               0x00000002 #define
>>  SECONDARY_EXEC_RDTSCP			0x00000008 +#define
>>  SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE   0x00000010 #define
>>  SECONDARY_EXEC_ENABLE_VPID              0x00000020 #define
>>  SECONDARY_EXEC_WBINVD_EXITING		0x00000040 #define
>>  SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index e1baf37..dba5300 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -1349,6 +1349,8 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64
> value)
>>  		u32 id = kvm_apic_id(apic);
>>  		u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
>>  		kvm_apic_set_ldr(apic, ldr);
>> +		kvm_x86_ops->enable_virtual_x2apic_mode(vcpu);
> And where do you disable it?
Yes, need to disable it when guest rolls back to xapic mode. Will add it in next patch.

>> +
>>  	}
>>  	apic->base_address = apic->vcpu->arch.apic_base &
>>  			     MSR_IA32_APICBASE_BASE;
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index a8a8a4e..3e34e19 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -3601,6 +3601,11 @@ static void svm_restore_rvi(struct kvm_vcpu *vcpu)
>>  	return;
>>  }
>> +static void svm_enable_virtual_x2apic_mode(struct kvm_vcpu *vcpu)
>> +{
>> +	return;
>> +}
>> +
>>  static int svm_nmi_allowed(struct kvm_vcpu *vcpu) { 	struct vcpu_svm
>>  *svm = to_svm(vcpu); @@ -4326,6 +4331,7 @@ static struct kvm_x86_ops
>>  svm_x86_ops = { 	.update_exitmap_end = svm_update_exitmap_end,
>>  	.load_eoi_exitmap = svm_load_eoi_exitmap, 	.restore_rvi =
>>  svm_restore_rvi,
>> +	.enable_virtual_x2apic_mode = svm_enable_virtual_x2apic_mode,
>> 
>>  	.set_tss_addr = svm_set_tss_addr,
>>  	.get_tdp_level = get_npt_level,
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 0c85c7e..466b05d 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -2551,6 +2551,7 @@ static __init int setup_vmcs_config(struct
> vmcs_config *vmcs_conf)
>>  	if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS)
>>  { 		min2 = 0; 		opt2 = SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
>>  +			SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
>>  			SECONDARY_EXEC_WBINVD_EXITING | 			SECONDARY_EXEC_ENABLE_VPID |
>>  			SECONDARY_EXEC_ENABLE_EPT | @@ -3739,7 +3740,10 @@ static void
>>  free_vpid(struct vcpu_vmx *vmx) 	spin_unlock(&vmx_vpid_lock); }
>> -static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
>> u32 msr) +#define MSR_TYPE_R	1 +#define MSR_TYPE_W	2 +static void
>> __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, +						u32
>> msr, int type)
>>  {
>>  	int f = sizeof(unsigned long);
>> @@ -3752,20 +3756,52 @@ static void
> __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, u32 msr)
>>  	 * We can control MSRs 0x00000000-0x00001fff and 0xc0000000-0xc0001fff.
>>  	 */
>>  	if (msr <= 0x1fff) {
>> -		__clear_bit(msr, msr_bitmap + 0x000 / f); /* read-low */
>> -		__clear_bit(msr, msr_bitmap + 0x800 / f); /* write-low */
>> +		if (type & MSR_TYPE_R)
>> +			/* read-low */
>> +			__clear_bit(msr, msr_bitmap + 0x000 / f);
>> +
>> +		if (type & MSR_TYPE_W)
>> +			/* write-low */
>> +			__clear_bit(msr, msr_bitmap + 0x800 / f);
>> +
>>  	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
>>  		msr &= 0x1fff;
>> -		__clear_bit(msr, msr_bitmap + 0x400 / f); /* read-high */
>> -		__clear_bit(msr, msr_bitmap + 0xc00 / f); /* write-high */
>> +		if (type & MSR_TYPE_R)
>> +			/* read-high */
>> +			__clear_bit(msr, msr_bitmap + 0x400 / f);
>> +
>> +		if (type & MSR_TYPE_W)
>> +			/* write-high */
>> +			__clear_bit(msr, msr_bitmap + 0xc00 / f);
>> +
>>  	}
>>  }
>>  
>>  static void vmx_disable_intercept_for_msr(u32 msr, bool longmode_only)
>>  {
>>  	if (!longmode_only)
>> -		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy, msr);
>> -	__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode, msr);
>> +		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy, +						msr,
>> MSR_TYPE_R | MSR_TYPE_W);
>> +	__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode, +						msr,
>> MSR_TYPE_R | MSR_TYPE_W); +} + +static void
>> vmx_disable_intercept_for_msr_read(u32 msr, bool longmode_only) +{ +	if
>> (!longmode_only)
>> +		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy, +						msr,
>> MSR_TYPE_R); +	__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode,
>> +					msr, MSR_TYPE_R); +} + +static void
>> vmx_disable_intercept_for_msr_write(u32 msr, bool longmode_only) +{
>> +	if (!longmode_only)
>> +		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy, +						msr,
>> MSR_TYPE_W); +	__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode,
>> +					msr, MSR_TYPE_W);
>>  }
>>  
>>  /*
>> @@ -3864,6 +3900,7 @@ static u32 vmx_secondary_exec_control(struct
> vcpu_vmx *vmx)
>>  	if (!enable_apicv_reg_vid) 		exec_control &=
>>  ~(SECONDARY_EXEC_APIC_REGISTER_VIRT | 				 
>>  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); +	exec_control &=
>>  ~SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE; 	return exec_control; }
>> @@ -6274,6 +6311,34 @@ static void vmx_load_eoi_exitmap(struct kvm_vcpu
> *vcpu)
>>  	spin_unlock(&vmx->eoi_bitmap_lock);
>>  }
>> +static void vmx_enable_virtual_x2apic_mode(struct kvm_vcpu *vcpu)
>> +{
>> +	u32 exec_control;
>> +	int msr;
>> +
>> +	if (!enable_apicv_reg_vid)
>> +		return;
>> +
>> +	exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
>> +	if (!(exec_control & CPU_BASED_TPR_SHADOW))
>> +		return;
>> +	exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
>> +	exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
>> +	exec_control |= SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
>> +
>> +	vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
>> +
>> +	for (msr = 0x800; msr <= 0x8ff; msr++)
>> +		vmx_disable_intercept_for_msr_read(msr, false);
>> +
>> +	/* TPR */
>> +	vmx_disable_intercept_for_msr_write(0x808, false);
>> +	/* EOI */
>> +	vmx_disable_intercept_for_msr_write(0x80b, false);
>> +	/* SELF-IPI */
>> +	vmx_disable_intercept_for_msr_write(0x83f, false);
>> +}
>> +
>>  static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx) { 	u32
>>  exit_intr_info; @@ -7544,6 +7609,7 @@ static struct kvm_x86_ops
>>  vmx_x86_ops = { 	.update_exitmap_end = vmx_update_exitmap_end,
>>  	.load_eoi_exitmap = vmx_load_eoi_exitmap, 	.restore_rvi =
>>  vmx_restore_rvi,
>> +	.enable_virtual_x2apic_mode = vmx_enable_virtual_x2apic_mode,
>> 
>>  	.set_tss_addr = vmx_set_tss_addr,
>>  	.get_tdp_level = get_ept_level,
>> --
>> 1.7.1
> 
> --
> 			Gleb.


Best regards,
Yang



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 3/3] x86, apicv: add virtual x2apic support
  2013-01-07  6:58     ` Zhang, Yang Z
@ 2013-01-07  7:07       ` Gleb Natapov
  0 siblings, 0 replies; 22+ messages in thread
From: Gleb Natapov @ 2013-01-07  7:07 UTC (permalink / raw)
  To: Zhang, Yang Z
  Cc: kvm@vger.kernel.org, Shan, Haitao, mtosatti@redhat.com,
	Tian, Kevin

On Mon, Jan 07, 2013 at 06:58:15AM +0000, Zhang, Yang Z wrote:
> Gleb Natapov wrote on 2013-01-07:
> > On Mon, Jan 07, 2013 at 10:02:37AM +0800, Yang Zhang wrote:
> >> From: Yang Zhang <yang.z.zhang@Intel.com>
> >> 
> >> basically to benefit from apicv, we need clear MSR bitmap for
> >> corresponding x2apic MSRs when guest enabled x2apic:
> >>     0x800 - 0x8ff: no read intercept for apicv register virtualization
> >>     TPR,EOI,SELF-IPI: no write intercept for virtual interrupt delivery
> >> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
> >> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
> >> ---
> >>  arch/x86/include/asm/kvm_host.h |    1 + arch/x86/include/asm/vmx.h   
> >>    |    1 + arch/x86/kvm/lapic.c            |    2 + arch/x86/kvm/svm.c
> >>               |    6 +++ arch/x86/kvm/vmx.c              |   80
> >>  +++++++++++++++++++++++++++++++++++--- 5 files changed, 83
> >>  insertions(+), 7 deletions(-)
> >> diff --git a/arch/x86/include/asm/kvm_host.h
> >> b/arch/x86/include/asm/kvm_host.h index 135603f..af9a8c3 100644 ---
> >> a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h
> >> @@ -704,6 +704,7 @@ struct kvm_x86_ops {
> >>  	void (*update_exitmap_end)(struct kvm_vcpu *vcpu); 	void
> >>  (*load_eoi_exitmap)(struct kvm_vcpu *vcpu); 	void
> >>  (*restore_rvi)(struct kvm_vcpu *vcpu); +	void
> >>  (*enable_virtual_x2apic_mode)(struct kvm_vcpu *vcpu); 	int
> >>  (*set_tss_addr)(struct kvm *kvm, unsigned int addr); 	int
> >>  (*get_tdp_level)(void); 	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu,
> >>  gfn_t gfn, bool is_mmio);
> >> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> >> index d1ab331..694586c 100644
> >> --- a/arch/x86/include/asm/vmx.h
> >> +++ b/arch/x86/include/asm/vmx.h
> >> @@ -140,6 +140,7 @@
> >>  #define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 0x00000001 #define
> >>  SECONDARY_EXEC_ENABLE_EPT               0x00000002 #define
> >>  SECONDARY_EXEC_RDTSCP			0x00000008 +#define
> >>  SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE   0x00000010 #define
> >>  SECONDARY_EXEC_ENABLE_VPID              0x00000020 #define
> >>  SECONDARY_EXEC_WBINVD_EXITING		0x00000040 #define
> >>  SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
> >> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> >> index e1baf37..dba5300 100644
> >> --- a/arch/x86/kvm/lapic.c
> >> +++ b/arch/x86/kvm/lapic.c
> >> @@ -1349,6 +1349,8 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64
> > value)
> >>  		u32 id = kvm_apic_id(apic);
> >>  		u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
> >>  		kvm_apic_set_ldr(apic, ldr);
> >> +		kvm_x86_ops->enable_virtual_x2apic_mode(vcpu);
> > And where do you disable it?
> Yes, need to disable it when guest rolls back to xapic mode. Will add it in next patch.
> 
You also need to reorder patches 2 and 3. Otherwise x2apic will be
broken after patch 2.

--
			Gleb.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2013-01-10  0:22 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-07  2:02 [PATCH v8 0/3] x86, apicv: Add APIC virtualization support Yang Zhang
2013-01-07  2:02 ` [PATCH v8 1/3] x86, apicv: add APICv register " Yang Zhang
2013-01-07  2:02 ` [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support Yang Zhang
2013-01-07  7:51   ` Gleb Natapov
2013-01-07 13:04     ` Zhang, Yang Z
2013-01-07 13:52   ` Marcelo Tosatti
2013-01-07 17:48     ` Gleb Natapov
2013-01-07 21:32       ` Marcelo Tosatti
2013-01-08  0:43         ` Zhang, Yang Z
2013-01-08 13:40           ` Marcelo Tosatti
2013-01-08 10:03         ` Gleb Natapov
2013-01-08 12:57           ` Zhang, Yang Z
2013-01-08 13:57             ` Marcelo Tosatti
2013-01-08 15:43               ` Gleb Natapov
2013-01-09  8:07                 ` Zhang, Yang Z
2013-01-09 15:10                   ` Marcelo Tosatti
2013-01-10  0:22                     ` Zhang, Yang Z
2013-01-08 13:53           ` Marcelo Tosatti
2013-01-07  2:02 ` [PATCH v8 3/3] x86, apicv: add virtual x2apic support Yang Zhang
2013-01-07  6:46   ` Gleb Natapov
2013-01-07  6:58     ` Zhang, Yang Z
2013-01-07  7:07       ` Gleb Natapov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox