[PATCH v1 3/3] kvm: svm: Use the hardware provided GPA instead of page walk

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v1 3/3] kvm: svm: Use the hardware provided GPA instead of page walk
  2016-11-14 22:04 [PATCH v1 0/3] x86: SVM: add additional SVM NPF error and use HW GPA Brijesh Singh
@ 2016-11-14 22:04 ` Brijesh Singh
  0 siblings, 0 replies; 13+ messages in thread
From: Brijesh Singh @ 2016-11-14 22:04 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, joro, x86, linux-kernel, mingo, hpa, pbonzini, tglx

From: Tom Lendacky <thomas.lendacky@amd.com>

When a guest causes a NPF which requires emulation, KVM sometimes walks
the guest page tables to translate the GVA to a GPA. This is unnecessary
most of the time on AMD hardware since the hardware provides the GPA in
EXITINFO2.

The only exception cases involve string operations involving rep or
operations that use two memory locations. With rep, the GPA will only be
the value of the initial NPF and with dual memory locations we won't know
which memory address was translated into EXITINFO2.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/kvm_emulate.h |    3 +++
 arch/x86/include/asm/kvm_host.h    |    3 +++
 arch/x86/kvm/svm.c                 |    9 ++++++++-
 arch/x86/kvm/x86.c                 |   17 ++++++++++++++++-
 4 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index e9cd7be..2d1ac09 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -344,6 +344,9 @@ struct x86_emulate_ctxt {
 	struct read_cache mem_read;
 };
 
+/* String operation identifier (matches the definition in emulate.c) */
+#define CTXT_STRING_OP	(1 << 13)
+
 /* Repeat String Operation Prefix */
 #define REPE_PREFIX	0xf3
 #define REPNE_PREFIX	0xf2
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 77cb3f9..fd5b1c8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -668,6 +668,9 @@ struct kvm_vcpu_arch {
 
 	int pending_ioapic_eoi;
 	int pending_external_vector;
+
+	/* GPA available (AMD only) */
+	bool gpa_available;
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 5e64e656..b442c5a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -275,6 +275,9 @@ static int avic;
 module_param(avic, int, S_IRUGO);
 #endif
 
+/* EXITINFO2 contains valid GPA */
+static bool gpa_avail = true;
+
 /* AVIC VM ID bit masks and lock */
 static DECLARE_BITMAP(avic_vm_id_bitmap, AVIC_VM_ID_NR);
 static DEFINE_SPINLOCK(avic_vm_id_lock);
@@ -1055,8 +1058,10 @@ static __init int svm_hardware_setup(void)
 			goto err;
 	}
 
-	if (!boot_cpu_has(X86_FEATURE_NPT))
+	if (!boot_cpu_has(X86_FEATURE_NPT)) {
 		npt_enabled = false;
+		gpa_avail = false;
+	}
 
 	if (npt_enabled && !npt) {
 		printk(KERN_INFO "kvm: Nested Paging disabled\n");
@@ -4192,6 +4197,8 @@ static int handle_exit(struct kvm_vcpu *vcpu)
 		vcpu->arch.cr0 = svm->vmcb->save.cr0;
 	if (npt_enabled)
 		vcpu->arch.cr3 = svm->vmcb->save.cr3;
+	if (gpa_avail)
+		vcpu->arch.gpa_available = (exit_code == SVM_EXIT_NPF);
 
 	if (unlikely(svm->nested.exit_required)) {
 		nested_svm_vmexit(svm);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d02aeff..c290794 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4420,7 +4420,19 @@ static int vcpu_mmio_gva_to_gpa(struct kvm_vcpu *vcpu, unsigned long gva,
 		return 1;
 	}
 
-	*gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception);
+	/*
+	 * If the exit was due to a NPF we may already have a GPA.
+	 * If the GPA is present, use it to avoid the GVA to GPA table
+	 * walk. Note, this cannot be used on string operations since
+	 * string operation using rep will only have the initial GPA
+	 * from when the NPF occurred.
+	 */
+	if (vcpu->arch.gpa_available &&
+	    !(vcpu->arch.emulate_ctxt.d & CTXT_STRING_OP))
+		*gpa = exception->address;
+	else
+		*gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access,
+						       exception);
 
 	if (*gpa == UNMAPPED_GVA)
 		return -1;
@@ -5542,6 +5554,9 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
 	}
 
 restart:
+	/* Save the faulting GPA (cr2) in the address field */
+	ctxt->exception.address = cr2;
+
 	r = x86_emulate_insn(ctxt);
 
 	if (r == EMULATION_INTERCEPTED)


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v1 0/3] x86: SVM: add additional SVM NPF error and use HW GPA
@ 2016-11-14 22:15 Brijesh Singh
  2016-11-14 22:15 ` [PATCH v1 1/3] kvm: svm: Add support for additional SVM NPF error codes Brijesh Singh
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Brijesh Singh @ 2016-11-14 22:15 UTC (permalink / raw)
  To: kvm
  Cc: Thomas.Lendacky, brijesh.singh, rkrcmar, joro, x86, linux-kernel,
	mingo, hpa, pbonzini, tglx, bp

(resending, forgot to add Tom Lendacky and Borislav Petkov in CC list)

This patch series is taken from SEV RFC series [1]. These patches do not
depend on the SEV feature and can be reviewed and merged on their own.

- Add support for additional SVM NFP error codes
- Add kvm_fast_pio_in support
- Use the hardware provided GPA instead of page walk

[1] http://marc.info/?l=linux-mm&m=147190814023863&w=2

Tom Lendacky (3):
      kvm: svm: Add support for additional SVM NPF error codes
      kvm: svm: Add kvm_fast_pio_in support
      kvm: svm: Use the hardware provided GPA instead of page walk


 arch/x86/include/asm/kvm_emulate.h |    3 ++
 arch/x86/include/asm/kvm_host.h    |   15 ++++++++-
 arch/x86/kvm/mmu.c                 |   20 +++++++++++-
 arch/x86/kvm/svm.c                 |   16 +++++++---
 arch/x86/kvm/x86.c                 |   60 +++++++++++++++++++++++++++++++++++-
 5 files changed, 106 insertions(+), 8 deletions(-)

-- 

Brijesh Singh


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v1 1/3] kvm: svm: Add support for additional SVM NPF error codes
  2016-11-14 22:15 [PATCH v1 0/3] x86: SVM: add additional SVM NPF error and use HW GPA Brijesh Singh
@ 2016-11-14 22:15 ` Brijesh Singh
  2016-11-21 15:12   ` Paolo Bonzini
  2016-11-14 22:15 ` [PATCH v1 2/3] kvm: svm: Add kvm_fast_pio_in support Brijesh Singh
  2016-11-14 22:16 ` [PATCH v1 3/3] kvm: svm: Use the hardware provided GPA instead of page walk Brijesh Singh
  2 siblings, 1 reply; 13+ messages in thread
From: Brijesh Singh @ 2016-11-14 22:15 UTC (permalink / raw)
  To: kvm
  Cc: Thomas.Lendacky, brijesh.singh, rkrcmar, joro, x86, linux-kernel,
	mingo, hpa, pbonzini, tglx, bp

From: Tom Lendacky <thomas.lendacky@amd.com>

AMD hardware adds two additional bits to aid in nested page fault handling.

Bit 32 - NPF occurred while translating the guest's final physical address
Bit 33 - NPF occurred while translating the guest page tables

The guest page tables fault indicator can be used as an aid for nested
virtualization. Using V0 for the host, V1 for the first level guest and
V2 for the second level guest, when both V1 and V2 are using nested paging
there are currently a number of unnecessary instruction emulations. When
V2 is launched shadow paging is used in V1 for the nested tables of V2. As
a result, KVM marks these pages as RO in the host nested page tables. When
V2 exits and we resume V1, these pages are still marked RO.

Every nested walk for a guest page table is treated as a user-level write
access and this causes a lot of NPFs because the V1 page tables are marked
RO in the V0 nested tables. While executing V1, when these NPFs occur KVM
sees a write to a read-only page, emulates the V1 instruction and unprotects
the page (marking it RW). This patch looks for cases where we get a NPF due
to a guest page table walk where the page was marked RO. It immediately
unprotects the page and resumes the guest, leading to far fewer instruction
emulations when nested virtualization is used.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/kvm_host.h |   11 ++++++++++-
 arch/x86/kvm/mmu.c              |   20 ++++++++++++++++++--
 arch/x86/kvm/svm.c              |    2 +-
 3 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bdde807..da07e17 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -191,6 +191,8 @@ enum {
 #define PFERR_RSVD_BIT 3
 #define PFERR_FETCH_BIT 4
 #define PFERR_PK_BIT 5
+#define PFERR_GUEST_FINAL_BIT 32
+#define PFERR_GUEST_PAGE_BIT 33
 
 #define PFERR_PRESENT_MASK (1U << PFERR_PRESENT_BIT)
 #define PFERR_WRITE_MASK (1U << PFERR_WRITE_BIT)
@@ -198,6 +200,13 @@ enum {
 #define PFERR_RSVD_MASK (1U << PFERR_RSVD_BIT)
 #define PFERR_FETCH_MASK (1U << PFERR_FETCH_BIT)
 #define PFERR_PK_MASK (1U << PFERR_PK_BIT)
+#define PFERR_GUEST_FINAL_MASK (1ULL << PFERR_GUEST_FINAL_BIT)
+#define PFERR_GUEST_PAGE_MASK (1ULL << PFERR_GUEST_PAGE_BIT)
+
+#define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
+				 PFERR_USER_MASK |		\
+				 PFERR_WRITE_MASK |		\
+				 PFERR_PRESENT_MASK)
 
 /* apic attention bits */
 #define KVM_APIC_CHECK_VAPIC	0
@@ -1203,7 +1212,7 @@ void kvm_vcpu_deactivate_apicv(struct kvm_vcpu *vcpu);
 
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
 
-int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u32 error_code,
+int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u64 error_code,
 		       void *insn, int insn_len);
 void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
 void kvm_mmu_new_cr3(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d9c7e98..f633d29 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4508,7 +4508,7 @@ static void make_mmu_pages_available(struct kvm_vcpu *vcpu)
 	kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
 }
 
-int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code,
+int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code,
 		       void *insn, int insn_len)
 {
 	int r, emulation_type = EMULTYPE_RETRY;
@@ -4527,12 +4527,28 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code,
 			return r;
 	}
 
-	r = vcpu->arch.mmu.page_fault(vcpu, cr2, error_code, false);
+	r = vcpu->arch.mmu.page_fault(vcpu, cr2, lower_32_bits(error_code),
+				      false);
 	if (r < 0)
 		return r;
 	if (!r)
 		return 1;
 
+	/*
+	 * Before emulating the instruction, check if the error code
+	 * was due to a RO violation while translating the guest page.
+	 * This can occur when using nested virtualization with nested
+	 * paging in both guests. If true, we simply unprotect the page
+	 * and resume the guest.
+	 *
+	 * Note: AMD only (since it supports the PFERR_GUEST_PAGE_MASK used
+	 *       in PFERR_NEXT_GUEST_PAGE)
+	 */
+	if (error_code == PFERR_NESTED_GUEST_PAGE) {
+		kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2));
+		return 1;
+	}
+
 	if (mmio_info_in_cache(vcpu, cr2, direct))
 		emulation_type = 0;
 emulate:
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 8ca1eca..4e462bb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2074,7 +2074,7 @@ static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned long value)
 static int pf_interception(struct vcpu_svm *svm)
 {
 	u64 fault_address = svm->vmcb->control.exit_info_2;
-	u32 error_code;
+	u64 error_code;
 	int r = 1;
 
 	switch (svm->apf_reason) {


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v1 2/3] kvm: svm: Add kvm_fast_pio_in support
  2016-11-14 22:15 [PATCH v1 0/3] x86: SVM: add additional SVM NPF error and use HW GPA Brijesh Singh
  2016-11-14 22:15 ` [PATCH v1 1/3] kvm: svm: Add support for additional SVM NPF error codes Brijesh Singh
@ 2016-11-14 22:15 ` Brijesh Singh
  2016-11-21 14:50   ` Paolo Bonzini
  2016-11-14 22:16 ` [PATCH v1 3/3] kvm: svm: Use the hardware provided GPA instead of page walk Brijesh Singh
  2 siblings, 1 reply; 13+ messages in thread
From: Brijesh Singh @ 2016-11-14 22:15 UTC (permalink / raw)
  To: kvm
  Cc: Thomas.Lendacky, brijesh.singh, rkrcmar, joro, x86, linux-kernel,
	mingo, hpa, pbonzini, tglx, bp

From: Tom Lendacky <thomas.lendacky@amd.com>

Update the I/O interception support to add the kvm_fast_pio_in function
to speed up the in instruction similar to the out instruction.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/kvm/svm.c              |    5 +++--
 arch/x86/kvm/x86.c              |   43 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index da07e17..77cb3f9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1133,6 +1133,7 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
 struct x86_emulate_ctxt;
 
 int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port);
+int kvm_fast_pio_in(struct kvm_vcpu *vcpu, int size, unsigned short port);
 void kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
 int kvm_emulate_halt(struct kvm_vcpu *vcpu);
 int kvm_vcpu_halt(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 4e462bb..5e64e656 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2270,7 +2270,7 @@ static int io_interception(struct vcpu_svm *svm)
 	++svm->vcpu.stat.io_exits;
 	string = (io_info & SVM_IOIO_STR_MASK) != 0;
 	in = (io_info & SVM_IOIO_TYPE_MASK) != 0;
-	if (string || in)
+	if (string)
 		return emulate_instruction(vcpu, 0) == EMULATE_DONE;
 
 	port = io_info >> 16;
@@ -2278,7 +2278,8 @@ static int io_interception(struct vcpu_svm *svm)
 	svm->next_rip = svm->vmcb->control.exit_info_2;
 	skip_emulated_instruction(&svm->vcpu);
 
-	return kvm_fast_pio_out(vcpu, size, port);
+	return in ? kvm_fast_pio_in(vcpu, size, port)
+		  : kvm_fast_pio_out(vcpu, size, port);
 }
 
 static int nmi_interception(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3017de0..d02aeff 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5617,6 +5617,49 @@ int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port)
 }
 EXPORT_SYMBOL_GPL(kvm_fast_pio_out);
 
+static int complete_fast_pio_in(struct kvm_vcpu *vcpu)
+{
+	unsigned long val;
+
+	/* We should only ever be called with arch.pio.count equal to 1 */
+	BUG_ON(vcpu->arch.pio.count != 1);
+
+	/* For size less than 4 we merge, else we zero extend */
+	val = (vcpu->arch.pio.size < 4) ? kvm_register_read(vcpu, VCPU_REGS_RAX)
+					: 0;
+
+	/*
+	 * Since vcpu->arch.pio.count == 1 let emulator_pio_in_emulated perform
+	 * the copy and tracing
+	 */
+	emulator_pio_in_emulated(&vcpu->arch.emulate_ctxt, vcpu->arch.pio.size,
+				 vcpu->arch.pio.port, &val, 1);
+	kvm_register_write(vcpu, VCPU_REGS_RAX, val);
+
+	return 1;
+}
+
+int kvm_fast_pio_in(struct kvm_vcpu *vcpu, int size, unsigned short port)
+{
+	unsigned long val;
+	int ret;
+
+	/* For size less than 4 we merge, else we zero extend */
+	val = (size < 4) ? kvm_register_read(vcpu, VCPU_REGS_RAX) : 0;
+
+	ret = emulator_pio_in_emulated(&vcpu->arch.emulate_ctxt, size, port,
+				       &val, 1);
+	if (ret) {
+		kvm_register_write(vcpu, VCPU_REGS_RAX, val);
+		return ret;
+	}
+
+	vcpu->arch.complete_userspace_io = complete_fast_pio_in;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_fast_pio_in);
+
 static int kvmclock_cpu_down_prep(unsigned int cpu)
 {
 	__this_cpu_write(cpu_tsc_khz, 0);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v1 3/3] kvm: svm: Use the hardware provided GPA instead of page walk
  2016-11-14 22:15 [PATCH v1 0/3] x86: SVM: add additional SVM NPF error and use HW GPA Brijesh Singh
  2016-11-14 22:15 ` [PATCH v1 1/3] kvm: svm: Add support for additional SVM NPF error codes Brijesh Singh
  2016-11-14 22:15 ` [PATCH v1 2/3] kvm: svm: Add kvm_fast_pio_in support Brijesh Singh
@ 2016-11-14 22:16 ` Brijesh Singh
  2016-11-21 15:07   ` Paolo Bonzini
  2 siblings, 1 reply; 13+ messages in thread
From: Brijesh Singh @ 2016-11-14 22:16 UTC (permalink / raw)
  To: kvm
  Cc: Thomas.Lendacky, brijesh.singh, rkrcmar, joro, x86, linux-kernel,
	mingo, hpa, pbonzini, tglx, bp

From: Tom Lendacky <thomas.lendacky@amd.com>

When a guest causes a NPF which requires emulation, KVM sometimes walks
the guest page tables to translate the GVA to a GPA. This is unnecessary
most of the time on AMD hardware since the hardware provides the GPA in
EXITINFO2.

The only exception cases involve string operations involving rep or
operations that use two memory locations. With rep, the GPA will only be
the value of the initial NPF and with dual memory locations we won't know
which memory address was translated into EXITINFO2.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/kvm_emulate.h |    3 +++
 arch/x86/include/asm/kvm_host.h    |    3 +++
 arch/x86/kvm/svm.c                 |    9 ++++++++-
 arch/x86/kvm/x86.c                 |   17 ++++++++++++++++-
 4 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index e9cd7be..2d1ac09 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -344,6 +344,9 @@ struct x86_emulate_ctxt {
 	struct read_cache mem_read;
 };
 
+/* String operation identifier (matches the definition in emulate.c) */
+#define CTXT_STRING_OP	(1 << 13)
+
 /* Repeat String Operation Prefix */
 #define REPE_PREFIX	0xf3
 #define REPNE_PREFIX	0xf2
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 77cb3f9..fd5b1c8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -668,6 +668,9 @@ struct kvm_vcpu_arch {
 
 	int pending_ioapic_eoi;
 	int pending_external_vector;
+
+	/* GPA available (AMD only) */
+	bool gpa_available;
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 5e64e656..b442c5a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -275,6 +275,9 @@ static int avic;
 module_param(avic, int, S_IRUGO);
 #endif
 
+/* EXITINFO2 contains valid GPA */
+static bool gpa_avail = true;
+
 /* AVIC VM ID bit masks and lock */
 static DECLARE_BITMAP(avic_vm_id_bitmap, AVIC_VM_ID_NR);
 static DEFINE_SPINLOCK(avic_vm_id_lock);
@@ -1055,8 +1058,10 @@ static __init int svm_hardware_setup(void)
 			goto err;
 	}
 
-	if (!boot_cpu_has(X86_FEATURE_NPT))
+	if (!boot_cpu_has(X86_FEATURE_NPT)) {
 		npt_enabled = false;
+		gpa_avail = false;
+	}
 
 	if (npt_enabled && !npt) {
 		printk(KERN_INFO "kvm: Nested Paging disabled\n");
@@ -4192,6 +4197,8 @@ static int handle_exit(struct kvm_vcpu *vcpu)
 		vcpu->arch.cr0 = svm->vmcb->save.cr0;
 	if (npt_enabled)
 		vcpu->arch.cr3 = svm->vmcb->save.cr3;
+	if (gpa_avail)
+		vcpu->arch.gpa_available = (exit_code == SVM_EXIT_NPF);
 
 	if (unlikely(svm->nested.exit_required)) {
 		nested_svm_vmexit(svm);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d02aeff..c290794 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4420,7 +4420,19 @@ static int vcpu_mmio_gva_to_gpa(struct kvm_vcpu *vcpu, unsigned long gva,
 		return 1;
 	}
 
-	*gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception);
+	/*
+	 * If the exit was due to a NPF we may already have a GPA.
+	 * If the GPA is present, use it to avoid the GVA to GPA table
+	 * walk. Note, this cannot be used on string operations since
+	 * string operation using rep will only have the initial GPA
+	 * from when the NPF occurred.
+	 */
+	if (vcpu->arch.gpa_available &&
+	    !(vcpu->arch.emulate_ctxt.d & CTXT_STRING_OP))
+		*gpa = exception->address;
+	else
+		*gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access,
+						       exception);
 
 	if (*gpa == UNMAPPED_GVA)
 		return -1;
@@ -5542,6 +5554,9 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
 	}
 
 restart:
+	/* Save the faulting GPA (cr2) in the address field */
+	ctxt->exception.address = cr2;
+
 	r = x86_emulate_insn(ctxt);
 
 	if (r == EMULATION_INTERCEPTED)

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 2/3] kvm: svm: Add kvm_fast_pio_in support
  2016-11-14 22:15 ` [PATCH v1 2/3] kvm: svm: Add kvm_fast_pio_in support Brijesh Singh
@ 2016-11-21 14:50   ` Paolo Bonzini
  2016-11-22 14:25     ` Tom Lendacky
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2016-11-21 14:50 UTC (permalink / raw)
  To: Brijesh Singh, kvm
  Cc: Thomas.Lendacky, rkrcmar, joro, x86, linux-kernel, mingo, hpa,
	tglx, bp



On 14/11/2016 23:15, Brijesh Singh wrote:
> +	/* For size less than 4 we merge, else we zero extend */
> +	val = (size < 4) ? kvm_register_read(vcpu, VCPU_REGS_RAX) : 0;

Are you sure it shouldn't always zero extend the high 32-bits?  So "val"
should be declared as u32.

Paolo

> +	ret = emulator_pio_in_emulated(&vcpu->arch.emulate_ctxt, size, port,
> +				       &val, 1);
> +	if (ret) {
> +		kvm_register_write(vcpu, VCPU_REGS_RAX, val);
> +		return ret;
> +	}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 3/3] kvm: svm: Use the hardware provided GPA instead of page walk
  2016-11-14 22:16 ` [PATCH v1 3/3] kvm: svm: Use the hardware provided GPA instead of page walk Brijesh Singh
@ 2016-11-21 15:07   ` Paolo Bonzini
  2016-11-22 14:13     ` Tom Lendacky
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2016-11-21 15:07 UTC (permalink / raw)
  To: Brijesh Singh, kvm
  Cc: Thomas.Lendacky, rkrcmar, joro, x86, linux-kernel, mingo, hpa,
	tglx, bp



On 14/11/2016 23:16, Brijesh Singh wrote:
> From: Tom Lendacky <thomas.lendacky@amd.com>
> 
> When a guest causes a NPF which requires emulation, KVM sometimes walks
> the guest page tables to translate the GVA to a GPA. This is unnecessary
> most of the time on AMD hardware since the hardware provides the GPA in
> EXITINFO2.
> 
> The only exception cases involve string operations involving rep or
> operations that use two memory locations. With rep, the GPA will only be
> the value of the initial NPF and with dual memory locations we won't know
> which memory address was translated into EXITINFO2.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> Reviewed-by: Borislav Petkov <bp@suse.de>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/include/asm/kvm_emulate.h |    3 +++
>  arch/x86/include/asm/kvm_host.h    |    3 +++
>  arch/x86/kvm/svm.c                 |    9 ++++++++-
>  arch/x86/kvm/x86.c                 |   17 ++++++++++++++++-
>  4 files changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
> index e9cd7be..2d1ac09 100644
> --- a/arch/x86/include/asm/kvm_emulate.h
> +++ b/arch/x86/include/asm/kvm_emulate.h
> @@ -344,6 +344,9 @@ struct x86_emulate_ctxt {
>  	struct read_cache mem_read;
>  };
>  
> +/* String operation identifier (matches the definition in emulate.c) */
> +#define CTXT_STRING_OP	(1 << 13)
> +
>  /* Repeat String Operation Prefix */
>  #define REPE_PREFIX	0xf3
>  #define REPNE_PREFIX	0xf2
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 77cb3f9..fd5b1c8 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -668,6 +668,9 @@ struct kvm_vcpu_arch {
>  
>  	int pending_ioapic_eoi;
>  	int pending_external_vector;
> +
> +	/* GPA available (AMD only) */
> +	bool gpa_available;
>  };
>  
>  struct kvm_lpage_info {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 5e64e656..b442c5a 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -275,6 +275,9 @@ static int avic;
>  module_param(avic, int, S_IRUGO);
>  #endif
>  
> +/* EXITINFO2 contains valid GPA */
> +static bool gpa_avail = true;
> +
>  /* AVIC VM ID bit masks and lock */
>  static DECLARE_BITMAP(avic_vm_id_bitmap, AVIC_VM_ID_NR);
>  static DEFINE_SPINLOCK(avic_vm_id_lock);
> @@ -1055,8 +1058,10 @@ static __init int svm_hardware_setup(void)
>  			goto err;
>  	}
>  
> -	if (!boot_cpu_has(X86_FEATURE_NPT))
> +	if (!boot_cpu_has(X86_FEATURE_NPT)) {
>  		npt_enabled = false;
> +		gpa_avail = false;

This is not necessary, since you will never have exit_code ==
SVM_EXIT_NPF && !gpa_avail.

> +	}
>  
>  	if (npt_enabled && !npt) {
>  		printk(KERN_INFO "kvm: Nested Paging disabled\n");
> @@ -4192,6 +4197,8 @@ static int handle_exit(struct kvm_vcpu *vcpu)
>  		vcpu->arch.cr0 = svm->vmcb->save.cr0;
>  	if (npt_enabled)
>  		vcpu->arch.cr3 = svm->vmcb->save.cr3;
> +	if (gpa_avail)
> +		vcpu->arch.gpa_available = (exit_code == SVM_EXIT_NPF);

The "if", and the body moved just before

	return svm_exit_handlers[exit_code](svm);

I'll try doing a similar patch for Intel too, for better testing.

>  
>  	if (unlikely(svm->nested.exit_required)) {
>  		nested_svm_vmexit(svm);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d02aeff..c290794 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4420,7 +4420,19 @@ static int vcpu_mmio_gva_to_gpa(struct kvm_vcpu *vcpu, unsigned long gva,
>  		return 1;
>  	}
>  
> -	*gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception);
> +	/*
> +	 * If the exit was due to a NPF we may already have a GPA.
> +	 * If the GPA is present, use it to avoid the GVA to GPA table
> +	 * walk. Note, this cannot be used on string operations since
> +	 * string operation using rep will only have the initial GPA
> +	 * from when the NPF occurred.
> +	 */
> +	if (vcpu->arch.gpa_available &&
> +	    !(vcpu->arch.emulate_ctxt.d & CTXT_STRING_OP))
> +		*gpa = exception->address;
> +	else
> +		*gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access,
> +						       exception);
>  
>  	if (*gpa == UNMAPPED_GVA)
>  		return -1;
> @@ -5542,6 +5554,9 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
>  	}
>  
>  restart:
> +	/* Save the faulting GPA (cr2) in the address field */
> +	ctxt->exception.address = cr2;
> +
>  	r = x86_emulate_insn(ctxt);
>  
>  	if (r == EMULATION_INTERCEPTED)
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 1/3] kvm: svm: Add support for additional SVM NPF error codes
  2016-11-14 22:15 ` [PATCH v1 1/3] kvm: svm: Add support for additional SVM NPF error codes Brijesh Singh
@ 2016-11-21 15:12   ` Paolo Bonzini
  2016-11-22 22:15     ` Tom Lendacky
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2016-11-21 15:12 UTC (permalink / raw)
  To: Brijesh Singh, kvm
  Cc: Thomas.Lendacky, rkrcmar, joro, x86, linux-kernel, mingo, hpa,
	tglx, bp



On 14/11/2016 23:15, Brijesh Singh wrote:
> From: Tom Lendacky <thomas.lendacky@amd.com>
> 
> AMD hardware adds two additional bits to aid in nested page fault handling.
> 
> Bit 32 - NPF occurred while translating the guest's final physical address
> Bit 33 - NPF occurred while translating the guest page tables

I have two questions out of curiosity, and to better understand the
differences between Intel and AMD:

1) are the two bits mutually exclusive, and is one bit always set?

2) what bit is set if the processor is reading the PDPTEs of a 32-bit
PAE guest?

Thanks,

Paolo

> The guest page tables fault indicator can be used as an aid for nested
> virtualization. Using V0 for the host, V1 for the first level guest and
> V2 for the second level guest, when both V1 and V2 are using nested paging
> there are currently a number of unnecessary instruction emulations. When
> V2 is launched shadow paging is used in V1 for the nested tables of V2. As
> a result, KVM marks these pages as RO in the host nested page tables. When
> V2 exits and we resume V1, these pages are still marked RO.
> 
> Every nested walk for a guest page table is treated as a user-level write
> access and this causes a lot of NPFs because the V1 page tables are marked
> RO in the V0 nested tables. While executing V1, when these NPFs occur KVM
> sees a write to a read-only page, emulates the V1 instruction and unprotects
> the page (marking it RW). This patch looks for cases where we get a NPF due
> to a guest page table walk where the page was marked RO. It immediately
> unprotects the page and resumes the guest, leading to far fewer instruction
> emulations when nested virtualization is used.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> Reviewed-by: Borislav Petkov <bp@suse.de>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/include/asm/kvm_host.h |   11 ++++++++++-
>  arch/x86/kvm/mmu.c              |   20 ++++++++++++++++++--
>  arch/x86/kvm/svm.c              |    2 +-
>  3 files changed, 29 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index bdde807..da07e17 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -191,6 +191,8 @@ enum {
>  #define PFERR_RSVD_BIT 3
>  #define PFERR_FETCH_BIT 4
>  #define PFERR_PK_BIT 5
> +#define PFERR_GUEST_FINAL_BIT 32
> +#define PFERR_GUEST_PAGE_BIT 33
>  
>  #define PFERR_PRESENT_MASK (1U << PFERR_PRESENT_BIT)
>  #define PFERR_WRITE_MASK (1U << PFERR_WRITE_BIT)
> @@ -198,6 +200,13 @@ enum {
>  #define PFERR_RSVD_MASK (1U << PFERR_RSVD_BIT)
>  #define PFERR_FETCH_MASK (1U << PFERR_FETCH_BIT)
>  #define PFERR_PK_MASK (1U << PFERR_PK_BIT)
> +#define PFERR_GUEST_FINAL_MASK (1ULL << PFERR_GUEST_FINAL_BIT)
> +#define PFERR_GUEST_PAGE_MASK (1ULL << PFERR_GUEST_PAGE_BIT)
> +
> +#define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
> +				 PFERR_USER_MASK |		\
> +				 PFERR_WRITE_MASK |		\
> +				 PFERR_PRESENT_MASK)
>  
>  /* apic attention bits */
>  #define KVM_APIC_CHECK_VAPIC	0
> @@ -1203,7 +1212,7 @@ void kvm_vcpu_deactivate_apicv(struct kvm_vcpu *vcpu);
>  
>  int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
>  
> -int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u32 error_code,
> +int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u64 error_code,
>  		       void *insn, int insn_len);
>  void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
>  void kvm_mmu_new_cr3(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index d9c7e98..f633d29 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -4508,7 +4508,7 @@ static void make_mmu_pages_available(struct kvm_vcpu *vcpu)
>  	kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
>  }
>  
> -int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code,
> +int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code,
>  		       void *insn, int insn_len)
>  {
>  	int r, emulation_type = EMULTYPE_RETRY;
> @@ -4527,12 +4527,28 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code,
>  			return r;
>  	}
>  
> -	r = vcpu->arch.mmu.page_fault(vcpu, cr2, error_code, false);
> +	r = vcpu->arch.mmu.page_fault(vcpu, cr2, lower_32_bits(error_code),
> +				      false);
>  	if (r < 0)
>  		return r;
>  	if (!r)
>  		return 1;
>  
> +	/*
> +	 * Before emulating the instruction, check if the error code
> +	 * was due to a RO violation while translating the guest page.
> +	 * This can occur when using nested virtualization with nested
> +	 * paging in both guests. If true, we simply unprotect the page
> +	 * and resume the guest.
> +	 *
> +	 * Note: AMD only (since it supports the PFERR_GUEST_PAGE_MASK used
> +	 *       in PFERR_NEXT_GUEST_PAGE)
> +	 */
> +	if (error_code == PFERR_NESTED_GUEST_PAGE) {
> +		kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2));
> +		return 1;
> +	}
> +
>  	if (mmio_info_in_cache(vcpu, cr2, direct))
>  		emulation_type = 0;
>  emulate:
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 8ca1eca..4e462bb 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -2074,7 +2074,7 @@ static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned long value)
>  static int pf_interception(struct vcpu_svm *svm)
>  {
>  	u64 fault_address = svm->vmcb->control.exit_info_2;
> -	u32 error_code;
> +	u64 error_code;
>  	int r = 1;
>  
>  	switch (svm->apf_reason) {
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 3/3] kvm: svm: Use the hardware provided GPA instead of page walk
  2016-11-21 15:07   ` Paolo Bonzini
@ 2016-11-22 14:13     ` Tom Lendacky
  0 siblings, 0 replies; 13+ messages in thread
From: Tom Lendacky @ 2016-11-22 14:13 UTC (permalink / raw)
  To: Paolo Bonzini, Brijesh Singh, kvm
  Cc: rkrcmar, joro, x86, linux-kernel, mingo, hpa, tglx, bp

On 11/21/2016 9:07 AM, Paolo Bonzini wrote:
> 
> 
> On 14/11/2016 23:16, Brijesh Singh wrote:
>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>
>> When a guest causes a NPF which requires emulation, KVM sometimes walks
>> the guest page tables to translate the GVA to a GPA. This is unnecessary
>> most of the time on AMD hardware since the hardware provides the GPA in
>> EXITINFO2.
>>
>> The only exception cases involve string operations involving rep or
>> operations that use two memory locations. With rep, the GPA will only be
>> the value of the initial NPF and with dual memory locations we won't know
>> which memory address was translated into EXITINFO2.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> Reviewed-by: Borislav Petkov <bp@suse.de>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> ---
>>  arch/x86/include/asm/kvm_emulate.h |    3 +++
>>  arch/x86/include/asm/kvm_host.h    |    3 +++
>>  arch/x86/kvm/svm.c                 |    9 ++++++++-
>>  arch/x86/kvm/x86.c                 |   17 ++++++++++++++++-
>>  4 files changed, 30 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
>> index e9cd7be..2d1ac09 100644
>> --- a/arch/x86/include/asm/kvm_emulate.h
>> +++ b/arch/x86/include/asm/kvm_emulate.h
>> @@ -344,6 +344,9 @@ struct x86_emulate_ctxt {
>>  	struct read_cache mem_read;
>>  };
>>  
>> +/* String operation identifier (matches the definition in emulate.c) */
>> +#define CTXT_STRING_OP	(1 << 13)
>> +
>>  /* Repeat String Operation Prefix */
>>  #define REPE_PREFIX	0xf3
>>  #define REPNE_PREFIX	0xf2
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 77cb3f9..fd5b1c8 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -668,6 +668,9 @@ struct kvm_vcpu_arch {
>>  
>>  	int pending_ioapic_eoi;
>>  	int pending_external_vector;
>> +
>> +	/* GPA available (AMD only) */
>> +	bool gpa_available;
>>  };
>>  
>>  struct kvm_lpage_info {
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index 5e64e656..b442c5a 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -275,6 +275,9 @@ static int avic;
>>  module_param(avic, int, S_IRUGO);
>>  #endif
>>  
>> +/* EXITINFO2 contains valid GPA */
>> +static bool gpa_avail = true;
>> +
>>  /* AVIC VM ID bit masks and lock */
>>  static DECLARE_BITMAP(avic_vm_id_bitmap, AVIC_VM_ID_NR);
>>  static DEFINE_SPINLOCK(avic_vm_id_lock);
>> @@ -1055,8 +1058,10 @@ static __init int svm_hardware_setup(void)
>>  			goto err;
>>  	}
>>  
>> -	if (!boot_cpu_has(X86_FEATURE_NPT))
>> +	if (!boot_cpu_has(X86_FEATURE_NPT)) {
>>  		npt_enabled = false;
>> +		gpa_avail = false;
> 
> This is not necessary, since you will never have exit_code ==
> SVM_EXIT_NPF && !gpa_avail.

Ah, yes, that is a bit redundant. We'll get rid of the gpa_avail
flag.

Thanks,
Tom

> 
>> +	}
>>  
>>  	if (npt_enabled && !npt) {
>>  		printk(KERN_INFO "kvm: Nested Paging disabled\n");
>> @@ -4192,6 +4197,8 @@ static int handle_exit(struct kvm_vcpu *vcpu)
>>  		vcpu->arch.cr0 = svm->vmcb->save.cr0;
>>  	if (npt_enabled)
>>  		vcpu->arch.cr3 = svm->vmcb->save.cr3;
>> +	if (gpa_avail)
>> +		vcpu->arch.gpa_available = (exit_code == SVM_EXIT_NPF);
> 
> The "if", and the body moved just before
> 
> 	return svm_exit_handlers[exit_code](svm);
> 
> I'll try doing a similar patch for Intel too, for better testing.
> 
>>  
>>  	if (unlikely(svm->nested.exit_required)) {
>>  		nested_svm_vmexit(svm);
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index d02aeff..c290794 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -4420,7 +4420,19 @@ static int vcpu_mmio_gva_to_gpa(struct kvm_vcpu *vcpu, unsigned long gva,
>>  		return 1;
>>  	}
>>  
>> -	*gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception);
>> +	/*
>> +	 * If the exit was due to a NPF we may already have a GPA.
>> +	 * If the GPA is present, use it to avoid the GVA to GPA table
>> +	 * walk. Note, this cannot be used on string operations since
>> +	 * string operation using rep will only have the initial GPA
>> +	 * from when the NPF occurred.
>> +	 */
>> +	if (vcpu->arch.gpa_available &&
>> +	    !(vcpu->arch.emulate_ctxt.d & CTXT_STRING_OP))
>> +		*gpa = exception->address;
>> +	else
>> +		*gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access,
>> +						       exception);
>>  
>>  	if (*gpa == UNMAPPED_GVA)
>>  		return -1;
>> @@ -5542,6 +5554,9 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
>>  	}
>>  
>>  restart:
>> +	/* Save the faulting GPA (cr2) in the address field */
>> +	ctxt->exception.address = cr2;
>> +
>>  	r = x86_emulate_insn(ctxt);
>>  
>>  	if (r == EMULATION_INTERCEPTED)
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 2/3] kvm: svm: Add kvm_fast_pio_in support
  2016-11-21 14:50   ` Paolo Bonzini
@ 2016-11-22 14:25     ` Tom Lendacky
  2016-11-22 15:12       ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Tom Lendacky @ 2016-11-22 14:25 UTC (permalink / raw)
  To: Paolo Bonzini, Brijesh Singh, kvm
  Cc: rkrcmar, joro, x86, linux-kernel, mingo, hpa, tglx, bp

On 11/21/2016 8:50 AM, Paolo Bonzini wrote:
> 
> 
> On 14/11/2016 23:15, Brijesh Singh wrote:
>> +	/* For size less than 4 we merge, else we zero extend */
>> +	val = (size < 4) ? kvm_register_read(vcpu, VCPU_REGS_RAX) : 0;
> 
> Are you sure it shouldn't always zero extend the high 32-bits?  So "val"
> should be declared as u32.
> 
> Paolo

It should only zero extend when dealing with a 32-bit operation. Any use
of 8 or 16 bit registers leaves the upper 56 or 48 bits as is (see
section 3.1.2.3 in http://support.amd.com/TechDocs/24592.pdf).

Thanks,
Tom

> 
>> +	ret = emulator_pio_in_emulated(&vcpu->arch.emulate_ctxt, size, port,
>> +				       &val, 1);
>> +	if (ret) {
>> +		kvm_register_write(vcpu, VCPU_REGS_RAX, val);
>> +		return ret;
>> +	}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 2/3] kvm: svm: Add kvm_fast_pio_in support
  2016-11-22 14:25     ` Tom Lendacky
@ 2016-11-22 15:12       ` Paolo Bonzini
  0 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2016-11-22 15:12 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Brijesh Singh, kvm, rkrcmar, joro, x86, linux-kernel, mingo, hpa,
	tglx, bp



----- Original Message -----
> From: "Tom Lendacky" <thomas.lendacky@amd.com>
> To: "Paolo Bonzini" <pbonzini@redhat.com>, "Brijesh Singh" <brijesh.singh@amd.com>, kvm@vger.kernel.org
> Cc: rkrcmar@redhat.com, joro@8bytes.org, x86@kernel.org, linux-kernel@vger.kernel.org, mingo@redhat.com,
> hpa@zytor.com, tglx@linutronix.de, bp@suse.de
> Sent: Tuesday, November 22, 2016 3:25:52 PM
> Subject: Re: [PATCH v1 2/3] kvm: svm: Add kvm_fast_pio_in support
> 
> On 11/21/2016 8:50 AM, Paolo Bonzini wrote:
> > 
> > 
> > On 14/11/2016 23:15, Brijesh Singh wrote:
> >> +	/* For size less than 4 we merge, else we zero extend */
> >> +	val = (size < 4) ? kvm_register_read(vcpu, VCPU_REGS_RAX) : 0;
> > 
> > Are you sure it shouldn't always zero extend the high 32-bits?  So "val"
> > should be declared as u32.
> > 
> > Paolo
> 
> It should only zero extend when dealing with a 32-bit operation. Any use
> of 8 or 16 bit registers leaves the upper 56 or 48 bits as is (see
> section 3.1.2.3 in http://support.amd.com/TechDocs/24592.pdf).

Duh, right, see also assign_register in arch/x86/kvm/emulate.c.

Paolo

> Thanks,
> Tom
> 
> > 
> >> +	ret = emulator_pio_in_emulated(&vcpu->arch.emulate_ctxt, size, port,
> >> +				       &val, 1);
> >> +	if (ret) {
> >> +		kvm_register_write(vcpu, VCPU_REGS_RAX, val);
> >> +		return ret;
> >> +	}
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 1/3] kvm: svm: Add support for additional SVM NPF error codes
  2016-11-21 15:12   ` Paolo Bonzini
@ 2016-11-22 22:15     ` Tom Lendacky
  2016-11-22 22:29       ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Tom Lendacky @ 2016-11-22 22:15 UTC (permalink / raw)
  To: Paolo Bonzini, Brijesh Singh, kvm
  Cc: rkrcmar, joro, x86, linux-kernel, mingo, hpa, tglx, bp

On 11/21/2016 9:12 AM, Paolo Bonzini wrote:
> 
> 
> On 14/11/2016 23:15, Brijesh Singh wrote:
>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>
>> AMD hardware adds two additional bits to aid in nested page fault handling.
>>
>> Bit 32 - NPF occurred while translating the guest's final physical address
>> Bit 33 - NPF occurred while translating the guest page tables
> 
> I have two questions out of curiosity, and to better understand the
> differences between Intel and AMD:

I talked with some folks about these questions and here's what we
determined:

> 
> 1) are the two bits mutually exclusive, and is one bit always set?

The two bits are mutually exclusive - either the processor encounters
the fault while translating the final gPA or while translating a guest
page table, there's no way for it to be both.

> 
> 2) what bit is set if the processor is reading the PDPTEs of a 32-bit
> PAE guest?

I believe that bit 33 will be set.  The PDPE's are considered guest
tables and are read during a guest table walk (see APM vol2 section
15.25.10).  Note that this is slightly different than the bare-metal
behavior of legacy PAE mode as APM describes. I'll try to test this
and verify it.

Thanks,
Tom

> 
> Thanks,
> 
> Paolo
> 
>> The guest page tables fault indicator can be used as an aid for nested
>> virtualization. Using V0 for the host, V1 for the first level guest and
>> V2 for the second level guest, when both V1 and V2 are using nested paging
>> there are currently a number of unnecessary instruction emulations. When
>> V2 is launched shadow paging is used in V1 for the nested tables of V2. As
>> a result, KVM marks these pages as RO in the host nested page tables. When
>> V2 exits and we resume V1, these pages are still marked RO.
>>
>> Every nested walk for a guest page table is treated as a user-level write
>> access and this causes a lot of NPFs because the V1 page tables are marked
>> RO in the V0 nested tables. While executing V1, when these NPFs occur KVM
>> sees a write to a read-only page, emulates the V1 instruction and unprotects
>> the page (marking it RW). This patch looks for cases where we get a NPF due
>> to a guest page table walk where the page was marked RO. It immediately
>> unprotects the page and resumes the guest, leading to far fewer instruction
>> emulations when nested virtualization is used.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> Reviewed-by: Borislav Petkov <bp@suse.de>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h |   11 ++++++++++-
>>  arch/x86/kvm/mmu.c              |   20 ++++++++++++++++++--
>>  arch/x86/kvm/svm.c              |    2 +-
>>  3 files changed, 29 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index bdde807..da07e17 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -191,6 +191,8 @@ enum {
>>  #define PFERR_RSVD_BIT 3
>>  #define PFERR_FETCH_BIT 4
>>  #define PFERR_PK_BIT 5
>> +#define PFERR_GUEST_FINAL_BIT 32
>> +#define PFERR_GUEST_PAGE_BIT 33
>>  
>>  #define PFERR_PRESENT_MASK (1U << PFERR_PRESENT_BIT)
>>  #define PFERR_WRITE_MASK (1U << PFERR_WRITE_BIT)
>> @@ -198,6 +200,13 @@ enum {
>>  #define PFERR_RSVD_MASK (1U << PFERR_RSVD_BIT)
>>  #define PFERR_FETCH_MASK (1U << PFERR_FETCH_BIT)
>>  #define PFERR_PK_MASK (1U << PFERR_PK_BIT)
>> +#define PFERR_GUEST_FINAL_MASK (1ULL << PFERR_GUEST_FINAL_BIT)
>> +#define PFERR_GUEST_PAGE_MASK (1ULL << PFERR_GUEST_PAGE_BIT)
>> +
>> +#define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
>> +				 PFERR_USER_MASK |		\
>> +				 PFERR_WRITE_MASK |		\
>> +				 PFERR_PRESENT_MASK)
>>  
>>  /* apic attention bits */
>>  #define KVM_APIC_CHECK_VAPIC	0
>> @@ -1203,7 +1212,7 @@ void kvm_vcpu_deactivate_apicv(struct kvm_vcpu *vcpu);
>>  
>>  int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
>>  
>> -int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u32 error_code,
>> +int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u64 error_code,
>>  		       void *insn, int insn_len);
>>  void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
>>  void kvm_mmu_new_cr3(struct kvm_vcpu *vcpu);
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index d9c7e98..f633d29 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -4508,7 +4508,7 @@ static void make_mmu_pages_available(struct kvm_vcpu *vcpu)
>>  	kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
>>  }
>>  
>> -int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code,
>> +int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code,
>>  		       void *insn, int insn_len)
>>  {
>>  	int r, emulation_type = EMULTYPE_RETRY;
>> @@ -4527,12 +4527,28 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code,
>>  			return r;
>>  	}
>>  
>> -	r = vcpu->arch.mmu.page_fault(vcpu, cr2, error_code, false);
>> +	r = vcpu->arch.mmu.page_fault(vcpu, cr2, lower_32_bits(error_code),
>> +				      false);
>>  	if (r < 0)
>>  		return r;
>>  	if (!r)
>>  		return 1;
>>  
>> +	/*
>> +	 * Before emulating the instruction, check if the error code
>> +	 * was due to a RO violation while translating the guest page.
>> +	 * This can occur when using nested virtualization with nested
>> +	 * paging in both guests. If true, we simply unprotect the page
>> +	 * and resume the guest.
>> +	 *
>> +	 * Note: AMD only (since it supports the PFERR_GUEST_PAGE_MASK used
>> +	 *       in PFERR_NEXT_GUEST_PAGE)
>> +	 */
>> +	if (error_code == PFERR_NESTED_GUEST_PAGE) {
>> +		kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2));
>> +		return 1;
>> +	}
>> +
>>  	if (mmio_info_in_cache(vcpu, cr2, direct))
>>  		emulation_type = 0;
>>  emulate:
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index 8ca1eca..4e462bb 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -2074,7 +2074,7 @@ static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned long value)
>>  static int pf_interception(struct vcpu_svm *svm)
>>  {
>>  	u64 fault_address = svm->vmcb->control.exit_info_2;
>> -	u32 error_code;
>> +	u64 error_code;
>>  	int r = 1;
>>  
>>  	switch (svm->apf_reason) {
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 1/3] kvm: svm: Add support for additional SVM NPF error codes
  2016-11-22 22:15     ` Tom Lendacky
@ 2016-11-22 22:29       ` Paolo Bonzini
  0 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2016-11-22 22:29 UTC (permalink / raw)
  To: Tom Lendacky, Brijesh Singh, kvm
  Cc: rkrcmar, joro, x86, linux-kernel, mingo, hpa, tglx, bp



On 22/11/2016 23:15, Tom Lendacky wrote:
> > 2) what bit is set if the processor is reading the PDPTEs of a 32-bit
> > PAE guest?
>
> I believe that bit 33 will be set.  The PDPE's are considered guest
> tables and are read during a guest table walk (see APM vol2 section
> 15.25.10).  Note that this is slightly different than the bare-metal
> behavior of legacy PAE mode as APM describes. I'll try to test this
> and verify it.

No big deal, indeed it's a bit different from Intel which caches the
four PDPEs, but it's enough to know that bit 33 will be set.

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-11-22 22:29 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-14 22:15 [PATCH v1 0/3] x86: SVM: add additional SVM NPF error and use HW GPA Brijesh Singh
2016-11-14 22:15 ` [PATCH v1 1/3] kvm: svm: Add support for additional SVM NPF error codes Brijesh Singh
2016-11-21 15:12   ` Paolo Bonzini
2016-11-22 22:15     ` Tom Lendacky
2016-11-22 22:29       ` Paolo Bonzini
2016-11-14 22:15 ` [PATCH v1 2/3] kvm: svm: Add kvm_fast_pio_in support Brijesh Singh
2016-11-21 14:50   ` Paolo Bonzini
2016-11-22 14:25     ` Tom Lendacky
2016-11-22 15:12       ` Paolo Bonzini
2016-11-14 22:16 ` [PATCH v1 3/3] kvm: svm: Use the hardware provided GPA instead of page walk Brijesh Singh
2016-11-21 15:07   ` Paolo Bonzini
2016-11-22 14:13     ` Tom Lendacky
  -- strict thread matches above, loose matches on Subject: below --
2016-11-14 22:04 [PATCH v1 0/3] x86: SVM: add additional SVM NPF error and use HW GPA Brijesh Singh
2016-11-14 22:04 ` [PATCH v1 3/3] kvm: svm: Use the hardware provided GPA instead of page walk Brijesh Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).