* [PATCH 1/9] KVM, pkeys: expose CPUID:PKU to guest
2015-11-09 11:54 [PATCH 0/9] KVM, pkeys: add memory protection-key support Huaitong Han
@ 2015-11-09 11:54 ` Huaitong Han
2015-11-09 11:54 ` [PATCH 2/9] KVM, pkeys: add pkeys support when setting CR4 Huaitong Han
` (8 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Huaitong Han @ 2015-11-09 11:54 UTC (permalink / raw)
To: pbonzini, gleb; +Cc: kvm, Huaitong Han
This patch expose X86_FEATURE_PKU to guest, X86_FEATURE_PKU is referred to
as "PKU" in the hardware documentation: CPUID.7.0.ECX[3]:PKU.
Signed-off-by: Huaitong Han <huaitong.han@intel.com>
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 156441b..29e6502 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -354,6 +354,9 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
const u32 kvm_supported_word10_x86_features =
F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | f_xsaves;
+ /* cpuid 7.0.ecx*/
+ const u32 kvm_supported_word11_x86_features = F(PKU) | 0 /*OSPKE*/;
+
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
@@ -431,10 +434,13 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
cpuid_mask(&entry->ebx, 9);
// TSC_ADJUST is emulated
entry->ebx |= F(TSC_ADJUST);
- } else
+ entry->ecx &= kvm_supported_word11_x86_features;
+ cpuid_mask(&entry->ecx, 13);
+ } else {
entry->ebx = 0;
+ entry->ecx = 0;
+ }
entry->eax = 0;
- entry->ecx = 0;
entry->edx = 0;
break;
}
--
2.4.3
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 2/9] KVM, pkeys: add pkeys support when setting CR4
2015-11-09 11:54 [PATCH 0/9] KVM, pkeys: add memory protection-key support Huaitong Han
2015-11-09 11:54 ` [PATCH 1/9] KVM, pkeys: expose CPUID:PKU to guest Huaitong Han
@ 2015-11-09 11:54 ` Huaitong Han
2015-11-09 11:54 ` [PATCH 3/9] KVM, pkeys: expose CPUID:OSPKE to guest Huaitong Han
` (7 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Huaitong Han @ 2015-11-09 11:54 UTC (permalink / raw)
To: pbonzini, gleb; +Cc: kvm, Huaitong Han
This patch adds pkeys support when setting CR4.PKE (bit 22).
Signed-off-by: Huaitong Han <huaitong.han@intel.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c12e845..3bbc1cb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -55,7 +55,8 @@
| X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE \
| X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
| X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
- | X86_CR4_OSXMMEXCPT | X86_CR4_VMXE | X86_CR4_SMAP))
+ | X86_CR4_OSXMMEXCPT | X86_CR4_VMXE | X86_CR4_SMAP \
+ | X86_CR4_PKE))
#define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index dd05b9c..7775158 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -70,6 +70,14 @@ static inline bool guest_cpuid_has_fsgsbase(struct kvm_vcpu *vcpu)
return best && (best->ebx & bit(X86_FEATURE_FSGSBASE));
}
+static inline bool guest_cpuid_has_pku(struct kvm_vcpu *vcpu)
+{
+ struct kvm_cpuid_entry2 *best;
+
+ best = kvm_find_cpuid_entry(vcpu, 7, 0);
+ return best && (best->ecx & bit(X86_FEATURE_PKU));
+}
+
static inline bool guest_cpuid_has_longmode(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2d4e54d..d268da0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -709,7 +709,7 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
{
unsigned long old_cr4 = kvm_read_cr4(vcpu);
unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE |
- X86_CR4_SMEP | X86_CR4_SMAP;
+ X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE;
if (cr4 & CR4_RESERVED_BITS)
return 1;
@@ -726,6 +726,9 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
if (!guest_cpuid_has_fsgsbase(vcpu) && (cr4 & X86_CR4_FSGSBASE))
return 1;
+ if (!guest_cpuid_has_pku(vcpu) && (cr4 & X86_CR4_PKE))
+ return 1;
+
if (is_long_mode(vcpu)) {
if (!(cr4 & X86_CR4_PAE))
return 1;
--
2.4.3
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 3/9] KVM, pkeys: expose CPUID:OSPKE to guest
2015-11-09 11:54 [PATCH 0/9] KVM, pkeys: add memory protection-key support Huaitong Han
2015-11-09 11:54 ` [PATCH 1/9] KVM, pkeys: expose CPUID:PKU to guest Huaitong Han
2015-11-09 11:54 ` [PATCH 2/9] KVM, pkeys: add pkeys support when setting CR4 Huaitong Han
@ 2015-11-09 11:54 ` Huaitong Han
2015-11-09 12:32 ` Paolo Bonzini
2015-11-09 11:54 ` [PATCH 4/9] KVM, pkeys: disable pkeys for guests in non-paging mode Huaitong Han
` (6 subsequent siblings)
9 siblings, 1 reply; 18+ messages in thread
From: Huaitong Han @ 2015-11-09 11:54 UTC (permalink / raw)
To: pbonzini, gleb; +Cc: kvm, Huaitong Han
This patch exposes X86_FEATURE_OSPKE to guest, X86_FEATURE_OSPKE is
software support for pkeys, enumerated with CPUID.7.0.ECX[4]:OSPKE,
and it reflects the setting of CR4.PKE.
Signed-off-by: Huaitong Han <huaitong.han@intel.com>
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 29e6502..ece687b 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -81,6 +81,17 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
apic->lapic_timer.timer_mode_mask = 1 << 17;
}
+ best = kvm_find_cpuid_entry(vcpu, 7, 0);
+ if (!best)
+ return 0;
+
+ /*Update OSPKE bit */
+ if (boot_cpu_has(X86_FEATURE_PKU) && best->function == 0x7) {
+ best->ecx &= ~F(OSPKE);
+ if (kvm_read_cr4_bits(vcpu, X86_CR4_PKE))
+ best->ecx |= F(OSPKE);
+ }
+
best = kvm_find_cpuid_entry(vcpu, 0xD, 0);
if (!best) {
vcpu->arch.guest_supported_xcr0 = 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d268da0..5181834 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -754,7 +754,7 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
(!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)))
kvm_mmu_reset_context(vcpu);
- if ((cr4 ^ old_cr4) & X86_CR4_OSXSAVE)
+ if ((cr4 ^ old_cr4) & (X86_CR4_OSXSAVE | X86_CR4_PKE))
kvm_update_cpuid(vcpu);
return 0;
@@ -6841,7 +6841,7 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
mmu_reset_needed |= kvm_read_cr4(vcpu) != sregs->cr4;
kvm_x86_ops->set_cr4(vcpu, sregs->cr4);
- if (sregs->cr4 & X86_CR4_OSXSAVE)
+ if (sregs->cr4 & (X86_CR4_OSXSAVE | X86_CR4_PKE))
kvm_update_cpuid(vcpu);
idx = srcu_read_lock(&vcpu->kvm->srcu);
--
2.4.3
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 3/9] KVM, pkeys: expose CPUID:OSPKE to guest
2015-11-09 11:54 ` [PATCH 3/9] KVM, pkeys: expose CPUID:OSPKE to guest Huaitong Han
@ 2015-11-09 12:32 ` Paolo Bonzini
0 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2015-11-09 12:32 UTC (permalink / raw)
To: Huaitong Han, gleb; +Cc: kvm
On 09/11/2015 12:54, Huaitong Han wrote:
> This patch exposes X86_FEATURE_OSPKE to guest, X86_FEATURE_OSPKE is
> software support for pkeys, enumerated with CPUID.7.0.ECX[4]:OSPKE,
> and it reflects the setting of CR4.PKE.
>
> Signed-off-by: Huaitong Han <huaitong.han@intel.com>
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 29e6502..ece687b 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -81,6 +81,17 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
> apic->lapic_timer.timer_mode_mask = 1 << 17;
> }
>
> + best = kvm_find_cpuid_entry(vcpu, 7, 0);
> + if (!best)
> + return 0;
> +
> + /*Update OSPKE bit */
> + if (boot_cpu_has(X86_FEATURE_PKU) && best->function == 0x7) {
> + best->ecx &= ~F(OSPKE);
> + if (kvm_read_cr4_bits(vcpu, X86_CR4_PKE))
> + best->ecx |= F(OSPKE);
> + }
> +
> best = kvm_find_cpuid_entry(vcpu, 0xD, 0);
> if (!best) {
> vcpu->arch.guest_supported_xcr0 = 0;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d268da0..5181834 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -754,7 +754,7 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)))
> kvm_mmu_reset_context(vcpu);
>
> - if ((cr4 ^ old_cr4) & X86_CR4_OSXSAVE)
> + if ((cr4 ^ old_cr4) & (X86_CR4_OSXSAVE | X86_CR4_PKE))
> kvm_update_cpuid(vcpu);
>
> return 0;
> @@ -6841,7 +6841,7 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
>
> mmu_reset_needed |= kvm_read_cr4(vcpu) != sregs->cr4;
> kvm_x86_ops->set_cr4(vcpu, sregs->cr4);
> - if (sregs->cr4 & X86_CR4_OSXSAVE)
> + if (sregs->cr4 & (X86_CR4_OSXSAVE | X86_CR4_PKE))
> kvm_update_cpuid(vcpu);
>
> idx = srcu_read_lock(&vcpu->kvm->srcu);
>
Hi, please squash these first three patches and move them last.
Paolo
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 4/9] KVM, pkeys: disable pkeys for guests in non-paging mode
2015-11-09 11:54 [PATCH 0/9] KVM, pkeys: add memory protection-key support Huaitong Han
` (2 preceding siblings ...)
2015-11-09 11:54 ` [PATCH 3/9] KVM, pkeys: expose CPUID:OSPKE to guest Huaitong Han
@ 2015-11-09 11:54 ` Huaitong Han
2015-11-09 11:54 ` [PATCH 5/9] KVM, pkeys: update memeory permission bitmask for pkeys Huaitong Han
` (5 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Huaitong Han @ 2015-11-09 11:54 UTC (permalink / raw)
To: pbonzini, gleb; +Cc: kvm, Huaitong Han
Pkeys is disabled if CPU is in non-paging mode in hardware. However KVM
always uses paging mode to emulate guest non-paging, mode with TDP. To
emulate this behavior, pkeys needs to be manually disabled when guest
switches to non-paging mode.
Signed-off-by: Huaitong Han <huaitong.han@intel.com>
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d019868..9b12c80 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3645,14 +3645,14 @@ static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
hw_cr4 &= ~X86_CR4_PAE;
hw_cr4 |= X86_CR4_PSE;
/*
- * SMEP/SMAP is disabled if CPU is in non-paging mode
- * in hardware. However KVM always uses paging mode to
- * emulate guest non-paging mode with TDP.
- * To emulate this behavior, SMEP/SMAP needs to be
+ * SMEP/SMAP/PKU is disabled if CPU is in non-paging
+ * mode in hardware. However KVM always uses paging
+ * mode to emulate guest non-paging mode with TDP.
+ * To emulate this behavior, SMEP/SMAP/PKU needs to be
* manually disabled when guest switches to non-paging
* mode.
*/
- hw_cr4 &= ~(X86_CR4_SMEP | X86_CR4_SMAP);
+ hw_cr4 &= ~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE);
} else if (!(cr4 & X86_CR4_PAE)) {
hw_cr4 &= ~X86_CR4_PAE;
}
--
2.4.3
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 5/9] KVM, pkeys: update memeory permission bitmask for pkeys
2015-11-09 11:54 [PATCH 0/9] KVM, pkeys: add memory protection-key support Huaitong Han
` (3 preceding siblings ...)
2015-11-09 11:54 ` [PATCH 4/9] KVM, pkeys: disable pkeys for guests in non-paging mode Huaitong Han
@ 2015-11-09 11:54 ` Huaitong Han
2015-11-09 13:24 ` Paolo Bonzini
2015-11-09 11:54 ` [PATCH 6/9] KVM, pkeys: add pkeys support for permission_fault logic Huaitong Han
` (4 subsequent siblings)
9 siblings, 1 reply; 18+ messages in thread
From: Huaitong Han @ 2015-11-09 11:54 UTC (permalink / raw)
To: pbonzini, gleb; +Cc: kvm, Huaitong Han
Pkeys define a new status bit in the PFEC. PFEC.PK (bit 5), if some
conditions is true, the fault is considered as a PKU violation.
This patch updates memeory permission bitmask for pkeys.
Signed-off-by: Huaitong Han <huaitong.han@intel.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3bbc1cb..3fc6ada 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -159,12 +159,14 @@ enum {
#define PFERR_USER_BIT 2
#define PFERR_RSVD_BIT 3
#define PFERR_FETCH_BIT 4
+#define PFERR_PK_BIT 5
#define PFERR_PRESENT_MASK (1U << PFERR_PRESENT_BIT)
#define PFERR_WRITE_MASK (1U << PFERR_WRITE_BIT)
#define PFERR_USER_MASK (1U << PFERR_USER_BIT)
#define PFERR_RSVD_MASK (1U << PFERR_RSVD_BIT)
#define PFERR_FETCH_MASK (1U << PFERR_FETCH_BIT)
+#define PFERR_PK_MASK (1U << PFERR_PK_BIT)
/* apic attention bits */
#define KVM_APIC_CHECK_VAPIC 0
@@ -290,8 +292,10 @@ struct kvm_mmu {
* Bitmap; bit set = permission fault
* Byte index: page fault error code [4:1]
* Bit index: pte permissions in ACC_* format
+ *
+ * Add PFEC.PK (bit 5) for protection-key violations
*/
- u8 permissions[16];
+ u8 permissions[32];
u64 *pae_root;
u64 *lm_root;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 69088a1..0568635 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3793,16 +3793,22 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
{
unsigned bit, byte, pfec;
u8 map;
- bool fault, x, w, u, wf, uf, ff, smapf, cr4_smap, cr4_smep, smap = 0;
+ bool fault, x, w, u, smap = 0, pku = 0;
+ bool wf, uf, ff, smapf, rsvdf, pkuf;
+ bool cr4_smap, cr4_smep, cr4_pku;
cr4_smep = kvm_read_cr4_bits(vcpu, X86_CR4_SMEP);
cr4_smap = kvm_read_cr4_bits(vcpu, X86_CR4_SMAP);
+ cr4_pku = kvm_read_cr4_bits(vcpu, X86_CR4_PKE);
+
for (byte = 0; byte < ARRAY_SIZE(mmu->permissions); ++byte) {
pfec = byte << 1;
map = 0;
wf = pfec & PFERR_WRITE_MASK;
uf = pfec & PFERR_USER_MASK;
ff = pfec & PFERR_FETCH_MASK;
+ rsvdf = pfec & PFERR_RSVD_MASK;
+ pkuf = pfec & PFERR_PK_MASK;
/*
* PFERR_RSVD_MASK bit is set in PFEC if the access is not
* subject to SMAP restrictions, and cleared otherwise. The
@@ -3841,12 +3847,34 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
* clearer.
*/
smap = cr4_smap && u && !uf && !ff;
+
+ /*
+ * PKU:additional mechanism by which the paging
+ * controls access to user-mode addresses based
+ * on the value in the PKRU register. A fault is
+ * considered as a PKU violation if all of the
+ * following conditions are true:
+ * 1.CR4_PKE=1.
+ * 2.EFER_LMA=1.
+ * 3.page is present with no reserved bit
+ * violations.
+ * 4.the access is not an instruction fetch.
+ * 5.the access is to a user page.
+ * 6.PKRU.AD=1
+ * or The access is a data write and
+ * PKRU.WD=1 and either CR0.WP=1
+ * or it is a user access.
+ *
+ * The 2nd and 6th conditions are computed
+ * dynamically in permission_fault.
+ */
+ pku = cr4_pku && !rsvdf && !ff && u;
} else
/* Not really needed: no U/S accesses on ept */
u = 1;
fault = (ff && !x) || (uf && !u) || (wf && !w) ||
- (smapf && smap);
+ (smapf && smap) || (pkuf && pku);
map |= fault << bit;
}
mmu->permissions[byte] = map;
--
2.4.3
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 6/9] KVM, pkeys: add pkeys support for permission_fault logic
2015-11-09 11:54 [PATCH 0/9] KVM, pkeys: add memory protection-key support Huaitong Han
` (4 preceding siblings ...)
2015-11-09 11:54 ` [PATCH 5/9] KVM, pkeys: update memeory permission bitmask for pkeys Huaitong Han
@ 2015-11-09 11:54 ` Huaitong Han
2015-11-09 12:43 ` Paolo Bonzini
2015-11-09 11:54 ` [PATCH 7/9] KVM, pkeys: Add pkeys support for gva_to_gpa funcions Huaitong Han
` (3 subsequent siblings)
9 siblings, 1 reply; 18+ messages in thread
From: Huaitong Han @ 2015-11-09 11:54 UTC (permalink / raw)
To: pbonzini, gleb; +Cc: kvm, Huaitong Han
Protection keys define a new 4-bit protection key field (PKEY) in bits
62:59 of leaf entries of the page tables, the PKEY is an index to PKRU
register(16 domains), every domain has 2 bits(write disable bit, access
disable bit).
Static logic has been produced in update_permission_bitmask, dynamic logic
need read pkey from page table entries, get pkru value, and deduce the
correct result.
Signed-off-by: Huaitong Han <huaitong.han@intel.com>
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index e4202e4..bbb5555 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -3,6 +3,7 @@
#include <linux/kvm_host.h>
#include "kvm_cache_regs.h"
+#include "x86.h"
#define PT64_PT_BITS 9
#define PT64_ENT_PER_PAGE (1 << PT64_PT_BITS)
@@ -24,6 +25,11 @@
#define PT_PAGE_SIZE_MASK (1ULL << PT_PAGE_SIZE_SHIFT)
#define PT_PAT_MASK (1ULL << 7)
#define PT_GLOBAL_MASK (1ULL << 8)
+
+#define PT64_PKEY_BIT0 (1ULL << _PAGE_BIT_PKEY_BIT0)
+#define PT64_PKEY_BIT1 (1ULL << _PAGE_BIT_PKEY_BIT1)
+#define PT64_PKEY_BIT2 (1ULL << _PAGE_BIT_PKEY_BIT2)
+#define PT64_PKEY_BIT3 (1ULL << _PAGE_BIT_PKEY_BIT3)
#define PT64_NX_SHIFT 63
#define PT64_NX_MASK (1ULL << PT64_NX_SHIFT)
@@ -45,6 +51,15 @@
#define PT_PAGE_TABLE_LEVEL 1
#define PT_MAX_HUGEPAGE_LEVEL (PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES - 1)
+#define PKEYS_BIT0_VALUE (1ULL << 0)
+#define PKEYS_BIT1_VALUE (1ULL << 1)
+#define PKEYS_BIT2_VALUE (1ULL << 2)
+#define PKEYS_BIT3_VALUE (1ULL << 3)
+
+#define PKRU_READ 0
+#define PKRU_WRITE 1
+#define PKRU_ATTRS 2
+
static inline u64 rsvd_bits(int s, int e)
{
return ((1ULL << (e - s + 1)) - 1) << s;
@@ -145,10 +160,44 @@ static inline bool is_write_protection(struct kvm_vcpu *vcpu)
* fault with the given access (in ACC_* format)?
*/
static inline bool permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
- unsigned pte_access, unsigned pfec)
+ unsigned pte_access, unsigned pte_pkeys, unsigned pfec)
{
- int cpl = kvm_x86_ops->get_cpl(vcpu);
- unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
+ unsigned long smap, rflags;
+ u32 pkru;
+ int cpl, index;
+ bool wf, uf, pk, pkru_ad, pkru_wd;
+
+ cpl = kvm_x86_ops->get_cpl(vcpu);
+ rflags = kvm_x86_ops->get_rflags(vcpu);
+
+ pkru = read_pkru();
+
+ /*
+ * PKRU defines 32 bits, there are 16 domains and 2 attribute bits per
+ * domain in pkru, pkey is index to a defined domain, so the value of
+ * pkey * PKRU_ATTRS + R/W is offset of a defined domain attribute.
+ */
+ pkru_ad = (pkru >> (pte_pkeys * PKRU_ATTRS + PKRU_READ)) & 1;
+ pkru_wd = (pkru >> (pte_pkeys * PKRU_ATTRS + PKRU_WRITE)) & 1;
+
+ wf = pfec & PFERR_WRITE_MASK;
+ uf = pfec & PFERR_USER_MASK;
+
+ /*
+ * PKeys 2nd and 6th conditions:
+ * 2.EFER_LMA=1
+ * 6.PKRU.AD=1
+ * or The access is a data write and PKRU.WD=1 and
+ * either CR0.WP=1 or it is a user mode access
+ */
+ pk = is_long_mode(vcpu) && (pkru_ad ||
+ (pkru_wd && wf && (is_write_protection(vcpu) || uf)));
+
+ /*
+ * PK bit right value in pfec equal to
+ * PK bit current value in pfec and pk value.
+ */
+ pfec &= (pk << PFERR_PK_BIT) + ~PFERR_PK_MASK;
/*
* If CPL < 3, SMAP prevention are disabled if EFLAGS.AC = 1.
@@ -163,8 +212,8 @@ static inline bool permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
* but it will be one in index if SMAP checks are being overridden.
* It is important to keep this branchless.
*/
- unsigned long smap = (cpl - 3) & (rflags & X86_EFLAGS_AC);
- int index = (pfec >> 1) +
+ smap = (cpl - 3) & (rflags & X86_EFLAGS_AC);
+ index = (pfec >> 1) +
(smap >> (X86_EFLAGS_AC_BIT - PFERR_RSVD_BIT + 1));
WARN_ON(pfec & PFERR_RSVD_MASK);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 736e6ab..99563bc 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -253,6 +253,17 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
}
return 0;
}
+static inline unsigned FNAME(gpte_pkeys)(struct kvm_vcpu *vcpu, u64 gpte)
+{
+ unsigned pkeys = 0;
+#if PTTYPE == 64
+ pkeys = ((gpte & PT64_PKEY_BIT0) ? PKEYS_BIT0_VALUE : 0) |
+ ((gpte & PT64_PKEY_BIT1) ? PKEYS_BIT1_VALUE : 0) |
+ ((gpte & PT64_PKEY_BIT2) ? PKEYS_BIT2_VALUE : 0) |
+ ((gpte & PT64_PKEY_BIT3) ? PKEYS_BIT3_VALUE : 0);
+#endif
+ return pkeys;
+}
/*
* Fetch a guest pte for a guest virtual address
@@ -265,12 +276,13 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
pt_element_t pte;
pt_element_t __user *uninitialized_var(ptep_user);
gfn_t table_gfn;
- unsigned index, pt_access, pte_access, accessed_dirty;
+ unsigned index, pt_access, pte_access, accessed_dirty, pte_pkeys;
gpa_t pte_gpa;
int offset;
const int write_fault = access & PFERR_WRITE_MASK;
const int user_fault = access & PFERR_USER_MASK;
const int fetch_fault = access & PFERR_FETCH_MASK;
+ const int pk_fault = access & PFERR_PK_MASK;
u16 errcode = 0;
gpa_t real_gpa;
gfn_t gfn;
@@ -356,7 +368,9 @@ retry_walk:
walker->ptes[walker->level - 1] = pte;
} while (!is_last_gpte(mmu, walker->level, pte));
- if (unlikely(permission_fault(vcpu, mmu, pte_access, access))) {
+ pte_pkeys = FNAME(gpte_pkeys)(vcpu, pte);
+ if (unlikely(permission_fault(vcpu, mmu, pte_access, pte_pkeys,
+ access))) {
errcode |= PFERR_PRESENT_MASK;
goto error;
}
@@ -399,7 +413,7 @@ retry_walk:
return 1;
error:
- errcode |= write_fault | user_fault;
+ errcode |= write_fault | user_fault | pk_fault;
if (fetch_fault && (mmu->nx ||
kvm_read_cr4_bits(vcpu, X86_CR4_SMEP)))
errcode |= PFERR_FETCH_MASK;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5181834..7a84b83 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4107,7 +4107,7 @@ static int vcpu_mmio_gva_to_gpa(struct kvm_vcpu *vcpu, unsigned long gva,
if (vcpu_match_mmio_gva(vcpu, gva)
&& !permission_fault(vcpu, vcpu->arch.walk_mmu,
- vcpu->arch.access, access)) {
+ vcpu->arch.access, 0, access)) {
*gpa = vcpu->arch.mmio_gfn << PAGE_SHIFT |
(gva & (PAGE_SIZE - 1));
trace_vcpu_match_mmio(gva, *gpa, write, false);
--
2.4.3
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 6/9] KVM, pkeys: add pkeys support for permission_fault logic
2015-11-09 11:54 ` [PATCH 6/9] KVM, pkeys: add pkeys support for permission_fault logic Huaitong Han
@ 2015-11-09 12:43 ` Paolo Bonzini
2015-11-09 13:17 ` Paolo Bonzini
0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2015-11-09 12:43 UTC (permalink / raw)
To: Huaitong Han, gleb; +Cc: kvm
On 09/11/2015 12:54, Huaitong Han wrote:
> Protection keys define a new 4-bit protection key field (PKEY) in bits
> 62:59 of leaf entries of the page tables, the PKEY is an index to PKRU
> register(16 domains), every domain has 2 bits(write disable bit, access
> disable bit).
>
> Static logic has been produced in update_permission_bitmask, dynamic logic
> need read pkey from page table entries, get pkru value, and deduce the
> correct result.
>
> Signed-off-by: Huaitong Han <huaitong.han@intel.com>
>
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index e4202e4..bbb5555 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -3,6 +3,7 @@
>
> #include <linux/kvm_host.h>
> #include "kvm_cache_regs.h"
> +#include "x86.h"
>
> #define PT64_PT_BITS 9
> #define PT64_ENT_PER_PAGE (1 << PT64_PT_BITS)
> @@ -24,6 +25,11 @@
> #define PT_PAGE_SIZE_MASK (1ULL << PT_PAGE_SIZE_SHIFT)
> #define PT_PAT_MASK (1ULL << 7)
> #define PT_GLOBAL_MASK (1ULL << 8)
> +
> +#define PT64_PKEY_BIT0 (1ULL << _PAGE_BIT_PKEY_BIT0)
> +#define PT64_PKEY_BIT1 (1ULL << _PAGE_BIT_PKEY_BIT1)
> +#define PT64_PKEY_BIT2 (1ULL << _PAGE_BIT_PKEY_BIT2)
> +#define PT64_PKEY_BIT3 (1ULL << _PAGE_BIT_PKEY_BIT3)
> #define PT64_NX_SHIFT 63
> #define PT64_NX_MASK (1ULL << PT64_NX_SHIFT)
>
> @@ -45,6 +51,15 @@
> #define PT_PAGE_TABLE_LEVEL 1
> #define PT_MAX_HUGEPAGE_LEVEL (PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES - 1)
>
> +#define PKEYS_BIT0_VALUE (1ULL << 0)
> +#define PKEYS_BIT1_VALUE (1ULL << 1)
> +#define PKEYS_BIT2_VALUE (1ULL << 2)
> +#define PKEYS_BIT3_VALUE (1ULL << 3)
> +
> +#define PKRU_READ 0
> +#define PKRU_WRITE 1
> +#define PKRU_ATTRS 2
> +
> static inline u64 rsvd_bits(int s, int e)
> {
> return ((1ULL << (e - s + 1)) - 1) << s;
> @@ -145,10 +160,44 @@ static inline bool is_write_protection(struct kvm_vcpu *vcpu)
> * fault with the given access (in ACC_* format)?
> */
> static inline bool permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> - unsigned pte_access, unsigned pfec)
> + unsigned pte_access, unsigned pte_pkeys, unsigned pfec)
> {
> - int cpl = kvm_x86_ops->get_cpl(vcpu);
> - unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
> + unsigned long smap, rflags;
> + u32 pkru;
> + int cpl, index;
> + bool wf, uf, pk, pkru_ad, pkru_wd;
> +
> + cpl = kvm_x86_ops->get_cpl(vcpu);
> + rflags = kvm_x86_ops->get_rflags(vcpu);
> +
> + pkru = read_pkru();
> +
> + /*
> + * PKRU defines 32 bits, there are 16 domains and 2 attribute bits per
> + * domain in pkru, pkey is index to a defined domain, so the value of
> + * pkey * PKRU_ATTRS + R/W is offset of a defined domain attribute.
> + */
> + pkru_ad = (pkru >> (pte_pkeys * PKRU_ATTRS + PKRU_READ)) & 1;
> + pkru_wd = (pkru >> (pte_pkeys * PKRU_ATTRS + PKRU_WRITE)) & 1;
> +
> + wf = pfec & PFERR_WRITE_MASK;
> + uf = pfec & PFERR_USER_MASK;
> +
> + /*
> + * PKeys 2nd and 6th conditions:
> + * 2.EFER_LMA=1
> + * 6.PKRU.AD=1
> + * or The access is a data write and PKRU.WD=1 and
> + * either CR0.WP=1 or it is a user mode access
> + */
> + pk = is_long_mode(vcpu) && (pkru_ad ||
> + (pkru_wd && wf && (is_write_protection(vcpu) || uf)));
> +
> + /*
> + * PK bit right value in pfec equal to
> + * PK bit current value in pfec and pk value.
> + */
> + pfec &= (pk << PFERR_PK_BIT) + ~PFERR_PK_MASK;
PK is only applicable to guest page tables, but if you do not support
PKRU without EPT (patch 9), none of this is necessary, is it?
> /*
> * If CPL < 3, SMAP prevention are disabled if EFLAGS.AC = 1.
> @@ -163,8 +212,8 @@ static inline bool permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> * but it will be one in index if SMAP checks are being overridden.
> * It is important to keep this branchless.
> */
> - unsigned long smap = (cpl - 3) & (rflags & X86_EFLAGS_AC);
> - int index = (pfec >> 1) +
> + smap = (cpl - 3) & (rflags & X86_EFLAGS_AC);
> + index = (pfec >> 1) +
> (smap >> (X86_EFLAGS_AC_BIT - PFERR_RSVD_BIT + 1));
>
> WARN_ON(pfec & PFERR_RSVD_MASK);
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 736e6ab..99563bc 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -253,6 +253,17 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
> }
> return 0;
> }
> +static inline unsigned FNAME(gpte_pkeys)(struct kvm_vcpu *vcpu, u64 gpte)
> +{
> + unsigned pkeys = 0;
> +#if PTTYPE == 64
> + pkeys = ((gpte & PT64_PKEY_BIT0) ? PKEYS_BIT0_VALUE : 0) |
> + ((gpte & PT64_PKEY_BIT1) ? PKEYS_BIT1_VALUE : 0) |
> + ((gpte & PT64_PKEY_BIT2) ? PKEYS_BIT2_VALUE : 0) |
> + ((gpte & PT64_PKEY_BIT3) ? PKEYS_BIT3_VALUE : 0);
This is just pkeys = (gpte >> _PAGE_BIT_PKEY_BIT0) & 15.
Paolo
> +#endif
> + return pkeys;
> +}
>
> /*
> * Fetch a guest pte for a guest virtual address
> @@ -265,12 +276,13 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
> pt_element_t pte;
> pt_element_t __user *uninitialized_var(ptep_user);
> gfn_t table_gfn;
> - unsigned index, pt_access, pte_access, accessed_dirty;
> + unsigned index, pt_access, pte_access, accessed_dirty, pte_pkeys;
> gpa_t pte_gpa;
> int offset;
> const int write_fault = access & PFERR_WRITE_MASK;
> const int user_fault = access & PFERR_USER_MASK;
> const int fetch_fault = access & PFERR_FETCH_MASK;
> + const int pk_fault = access & PFERR_PK_MASK;
> u16 errcode = 0;
> gpa_t real_gpa;
> gfn_t gfn;
> @@ -356,7 +368,9 @@ retry_walk:
> walker->ptes[walker->level - 1] = pte;
> } while (!is_last_gpte(mmu, walker->level, pte));
>
> - if (unlikely(permission_fault(vcpu, mmu, pte_access, access))) {
> + pte_pkeys = FNAME(gpte_pkeys)(vcpu, pte);
> + if (unlikely(permission_fault(vcpu, mmu, pte_access, pte_pkeys,
> + access))) {
> errcode |= PFERR_PRESENT_MASK;
> goto error;
> }
> @@ -399,7 +413,7 @@ retry_walk:
> return 1;
>
> error:
> - errcode |= write_fault | user_fault;
> + errcode |= write_fault | user_fault | pk_fault;
> if (fetch_fault && (mmu->nx ||
> kvm_read_cr4_bits(vcpu, X86_CR4_SMEP)))
> errcode |= PFERR_FETCH_MASK;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5181834..7a84b83 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4107,7 +4107,7 @@ static int vcpu_mmio_gva_to_gpa(struct kvm_vcpu *vcpu, unsigned long gva,
>
> if (vcpu_match_mmio_gva(vcpu, gva)
> && !permission_fault(vcpu, vcpu->arch.walk_mmu,
> - vcpu->arch.access, access)) {
> + vcpu->arch.access, 0, access)) {
> *gpa = vcpu->arch.mmio_gfn << PAGE_SHIFT |
> (gva & (PAGE_SIZE - 1));
> trace_vcpu_match_mmio(gva, *gpa, write, false);
>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 6/9] KVM, pkeys: add pkeys support for permission_fault logic
2015-11-09 12:43 ` Paolo Bonzini
@ 2015-11-09 13:17 ` Paolo Bonzini
2015-11-10 9:28 ` Han, Huaitong
0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2015-11-09 13:17 UTC (permalink / raw)
To: Huaitong Han, gleb; +Cc: kvm
On 09/11/2015 13:43, Paolo Bonzini wrote:
>
>
> On 09/11/2015 12:54, Huaitong Han wrote:
>> Protection keys define a new 4-bit protection key field (PKEY) in bits
>> 62:59 of leaf entries of the page tables, the PKEY is an index to PKRU
>> register(16 domains), every domain has 2 bits(write disable bit, access
>> disable bit).
>>
>> Static logic has been produced in update_permission_bitmask, dynamic logic
>> need read pkey from page table entries, get pkru value, and deduce the
>> correct result.
>>
>> Signed-off-by: Huaitong Han <huaitong.han@intel.com>
>>
>> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>> index e4202e4..bbb5555 100644
>> --- a/arch/x86/kvm/mmu.h
>> +++ b/arch/x86/kvm/mmu.h
>> @@ -3,6 +3,7 @@
>>
>> #include <linux/kvm_host.h>
>> #include "kvm_cache_regs.h"
>> +#include "x86.h"
>>
>> #define PT64_PT_BITS 9
>> #define PT64_ENT_PER_PAGE (1 << PT64_PT_BITS)
>> @@ -24,6 +25,11 @@
>> #define PT_PAGE_SIZE_MASK (1ULL << PT_PAGE_SIZE_SHIFT)
>> #define PT_PAT_MASK (1ULL << 7)
>> #define PT_GLOBAL_MASK (1ULL << 8)
>> +
>> +#define PT64_PKEY_BIT0 (1ULL << _PAGE_BIT_PKEY_BIT0)
>> +#define PT64_PKEY_BIT1 (1ULL << _PAGE_BIT_PKEY_BIT1)
>> +#define PT64_PKEY_BIT2 (1ULL << _PAGE_BIT_PKEY_BIT2)
>> +#define PT64_PKEY_BIT3 (1ULL << _PAGE_BIT_PKEY_BIT3)
>> #define PT64_NX_SHIFT 63
>> #define PT64_NX_MASK (1ULL << PT64_NX_SHIFT)
>>
>> @@ -45,6 +51,15 @@
>> #define PT_PAGE_TABLE_LEVEL 1
>> #define PT_MAX_HUGEPAGE_LEVEL (PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES - 1)
>>
>> +#define PKEYS_BIT0_VALUE (1ULL << 0)
>> +#define PKEYS_BIT1_VALUE (1ULL << 1)
>> +#define PKEYS_BIT2_VALUE (1ULL << 2)
>> +#define PKEYS_BIT3_VALUE (1ULL << 3)
>> +
>> +#define PKRU_READ 0
>> +#define PKRU_WRITE 1
>> +#define PKRU_ATTRS 2
>> +
>> static inline u64 rsvd_bits(int s, int e)
>> {
>> return ((1ULL << (e - s + 1)) - 1) << s;
>> @@ -145,10 +160,44 @@ static inline bool is_write_protection(struct kvm_vcpu *vcpu)
>> * fault with the given access (in ACC_* format)?
>> */
>> static inline bool permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>> - unsigned pte_access, unsigned pfec)
>> + unsigned pte_access, unsigned pte_pkeys, unsigned pfec)
>> {
>> - int cpl = kvm_x86_ops->get_cpl(vcpu);
>> - unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
>> + unsigned long smap, rflags;
>> + u32 pkru;
>> + int cpl, index;
>> + bool wf, uf, pk, pkru_ad, pkru_wd;
>> +
>> + cpl = kvm_x86_ops->get_cpl(vcpu);
>> + rflags = kvm_x86_ops->get_rflags(vcpu);
>> +
>> + pkru = read_pkru();
>> +
>> + /*
>> + * PKRU defines 32 bits, there are 16 domains and 2 attribute bits per
>> + * domain in pkru, pkey is index to a defined domain, so the value of
>> + * pkey * PKRU_ATTRS + R/W is offset of a defined domain attribute.
>> + */
>> + pkru_ad = (pkru >> (pte_pkeys * PKRU_ATTRS + PKRU_READ)) & 1;
>> + pkru_wd = (pkru >> (pte_pkeys * PKRU_ATTRS + PKRU_WRITE)) & 1;
>> +
>> + wf = pfec & PFERR_WRITE_MASK;
>> + uf = pfec & PFERR_USER_MASK;
>> +
>> + /*
>> + * PKeys 2nd and 6th conditions:
>> + * 2.EFER_LMA=1
>> + * 6.PKRU.AD=1
>> + * or The access is a data write and PKRU.WD=1 and
>> + * either CR0.WP=1 or it is a user mode access
>> + */
>> + pk = is_long_mode(vcpu) && (pkru_ad ||
>> + (pkru_wd && wf && (is_write_protection(vcpu) || uf)));
A little more optimized:
pkru_bits = (pkru >> (pte_pkeys * PKRU_ATTRS)) & 3;
/*
* Ignore PKRU.WD if not relevant to this access (a read,
* or a supervisor mode access if CR0.WP=0).
*/
if (!wf || (!uf && !is_write_protection(vcpu)))
pkru_bits &= ~(1 << PKRU_WRITE);
... and then just check pkru_bits != 0.
>> + /*
>> + * PK bit right value in pfec equal to
>> + * PK bit current value in pfec and pk value.
>> + */
>> + pfec &= (pk << PFERR_PK_BIT) + ~PFERR_PK_MASK;
>
> PK is only applicable to guest page tables, but if you do not support
> PKRU without EPT (patch 9), none of this is necessary, is it?
Doh. :( Sorry, this is of course needed for the emulation case.
I think you should optimize this for the common case where pkru is zero,
hence pk will always be zero. So something like
pkru = is_long_mode(vcpu) ? read_pkru() : 0;
if (unlikely(pkru) && (pfec & PFERR_PK_MASK)) {
... from above ... */
/* Flip PFERR_PK_MASK if pkru_bits is non-zero */
pfec ^= -pkru_bits & PFERR_PK_MASK;
}
Paolo
>> /*
>> * If CPL < 3, SMAP prevention are disabled if EFLAGS.AC = 1.
>> @@ -163,8 +212,8 @@ static inline bool permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>> * but it will be one in index if SMAP checks are being overridden.
>> * It is important to keep this branchless.
>> */
>> - unsigned long smap = (cpl - 3) & (rflags & X86_EFLAGS_AC);
>> - int index = (pfec >> 1) +
>> + smap = (cpl - 3) & (rflags & X86_EFLAGS_AC);
>> + index = (pfec >> 1) +
>> (smap >> (X86_EFLAGS_AC_BIT - PFERR_RSVD_BIT + 1));
>>
>> WARN_ON(pfec & PFERR_RSVD_MASK);
>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>> index 736e6ab..99563bc 100644
>> --- a/arch/x86/kvm/paging_tmpl.h
>> +++ b/arch/x86/kvm/paging_tmpl.h
>> @@ -253,6 +253,17 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
>> }
>> return 0;
>> }
>> +static inline unsigned FNAME(gpte_pkeys)(struct kvm_vcpu *vcpu, u64 gpte)
>> +{
>> + unsigned pkeys = 0;
>> +#if PTTYPE == 64
>> + pkeys = ((gpte & PT64_PKEY_BIT0) ? PKEYS_BIT0_VALUE : 0) |
>> + ((gpte & PT64_PKEY_BIT1) ? PKEYS_BIT1_VALUE : 0) |
>> + ((gpte & PT64_PKEY_BIT2) ? PKEYS_BIT2_VALUE : 0) |
>> + ((gpte & PT64_PKEY_BIT3) ? PKEYS_BIT3_VALUE : 0);
>
> This is just pkeys = (gpte >> _PAGE_BIT_PKEY_BIT0) & 15.
>
> Paolo
>
>> +#endif
>> + return pkeys;
>> +}
>>
>> /*
>> * Fetch a guest pte for a guest virtual address
>> @@ -265,12 +276,13 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
>> pt_element_t pte;
>> pt_element_t __user *uninitialized_var(ptep_user);
>> gfn_t table_gfn;
>> - unsigned index, pt_access, pte_access, accessed_dirty;
>> + unsigned index, pt_access, pte_access, accessed_dirty, pte_pkeys;
>> gpa_t pte_gpa;
>> int offset;
>> const int write_fault = access & PFERR_WRITE_MASK;
>> const int user_fault = access & PFERR_USER_MASK;
>> const int fetch_fault = access & PFERR_FETCH_MASK;
>> + const int pk_fault = access & PFERR_PK_MASK;
>> u16 errcode = 0;
>> gpa_t real_gpa;
>> gfn_t gfn;
>> @@ -356,7 +368,9 @@ retry_walk:
>> walker->ptes[walker->level - 1] = pte;
>> } while (!is_last_gpte(mmu, walker->level, pte));
>>
>> - if (unlikely(permission_fault(vcpu, mmu, pte_access, access))) {
>> + pte_pkeys = FNAME(gpte_pkeys)(vcpu, pte);
>> + if (unlikely(permission_fault(vcpu, mmu, pte_access, pte_pkeys,
>> + access))) {
>> errcode |= PFERR_PRESENT_MASK;
>> goto error;
>> }
>> @@ -399,7 +413,7 @@ retry_walk:
>> return 1;
>>
>> error:
>> - errcode |= write_fault | user_fault;
>> + errcode |= write_fault | user_fault | pk_fault;
>> if (fetch_fault && (mmu->nx ||
>> kvm_read_cr4_bits(vcpu, X86_CR4_SMEP)))
>> errcode |= PFERR_FETCH_MASK;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 5181834..7a84b83 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -4107,7 +4107,7 @@ static int vcpu_mmio_gva_to_gpa(struct kvm_vcpu *vcpu, unsigned long gva,
>>
>> if (vcpu_match_mmio_gva(vcpu, gva)
>> && !permission_fault(vcpu, vcpu->arch.walk_mmu,
>> - vcpu->arch.access, access)) {
>> + vcpu->arch.access, 0, access)) {
>> *gpa = vcpu->arch.mmio_gfn << PAGE_SHIFT |
>> (gva & (PAGE_SIZE - 1));
>> trace_vcpu_match_mmio(gva, *gpa, write, false);
>>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 6/9] KVM, pkeys: add pkeys support for permission_fault logic
2015-11-09 13:17 ` Paolo Bonzini
@ 2015-11-10 9:28 ` Han, Huaitong
2015-11-10 9:35 ` Paolo Bonzini
0 siblings, 1 reply; 18+ messages in thread
From: Han, Huaitong @ 2015-11-10 9:28 UTC (permalink / raw)
To: pbonzini@redhat.com; +Cc: gleb@kernel.org, kvm@vger.kernel.org
On Mon, 2015-11-09 at 14:17 +0100, Paolo Bonzini wrote:
> > > static inline bool permission_fault(struct kvm_vcpu *vcpu,
> > > struct kvm_mmu *mmu,
> > > - unsigned pte_access,
> > > unsigned pfec)
> > > + unsigned pte_access, unsigned pte_pkeys,
> > > unsigned pfec)
> > > {
> > > - int cpl = kvm_x86_ops->get_cpl(vcpu);
> > > - unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
> > > + unsigned long smap, rflags;
> > > + u32 pkru;
> > > + int cpl, index;
> > > + bool wf, uf, pk, pkru_ad, pkru_wd;
> > > +
> > > + cpl = kvm_x86_ops->get_cpl(vcpu);
> > > + rflags = kvm_x86_ops->get_rflags(vcpu);
> > > +
> > > + pkru = read_pkru();
> > > +
> > > + /*
> > > + * PKRU defines 32 bits, there are 16 domains and 2
> > > attribute bits per
> > > + * domain in pkru, pkey is index to a defined domain, so
> > > the value of
> > > + * pkey * PKRU_ATTRS + R/W is offset of a defined domain
> > > attribute.
> > > + */
> > > + pkru_ad = (pkru >> (pte_pkeys * PKRU_ATTRS + PKRU_READ))
> > > & 1;
> > > + pkru_wd = (pkru >> (pte_pkeys * PKRU_ATTRS +
> > > PKRU_WRITE)) & 1;
> > > +
> > > + wf = pfec & PFERR_WRITE_MASK;
> > > + uf = pfec & PFERR_USER_MASK;
> > > +
> > > + /*
> > > + * PKeys 2nd and 6th conditions:
> > > + * 2.EFER_LMA=1
> > > + * 6.PKRU.AD=1
> > > + * or The access is a data write and PKRU.WD=1 and
> > > + * either CR0.WP=1 or it is a user mode
> > > access
> > > + */
> > > + pk = is_long_mode(vcpu) && (pkru_ad ||
> > > + (pkru_wd && wf &&
> > > (is_write_protection(vcpu) || uf)));
>
> A little more optimized:
>
> pkru_bits = (pkru >> (pte_pkeys * PKRU_ATTRS)) & 3;
>
> /*
> * Ignore PKRU.WD if not relevant to this access (a read,
> * or a supervisor mode access if CR0.WP=0).
> */
> if (!wf || (!uf && !is_write_protection(vcpu)))
> pkru_bits &= ~(1 << PKRU_WRITE);
>
> ... and then just check pkru_bits != 0.
>
> > > + /*
> > > + * PK bit right value in pfec equal to
> > > + * PK bit current value in pfec and pk value.
> > > + */
> > > + pfec &= (pk << PFERR_PK_BIT) + ~PFERR_PK_MASK;
> >
> > PK is only applicable to guest page tables, but if you do not
> > support
> > PKRU without EPT (patch 9), none of this is necessary, is it?
>
> Doh. :( Sorry, this is of course needed for the emulation case.
>
> I think you should optimize this for the common case where pkru is
> zero,
> hence pk will always be zero. So something like
>
> pkru = is_long_mode(vcpu) ? read_pkru() : 0;
> if (unlikely(pkru) && (pfec & PFERR_PK_MASK)) {
> ... from above ... */
>
> /* Flip PFERR_PK_MASK if pkru_bits is non-zero */
> pfec ^= -pkru_bits & PFERR_PK_MASK;
If pkru_bits is zero, it means dynamically conditions is not met for
protection-key violations, so pfec on PK bit should be flipped. So I
guess it should be:
pfec ^= pkru_bits ? 0 : PFERR_PK_MASK;
> }
>
> Paolo
>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 6/9] KVM, pkeys: add pkeys support for permission_fault logic
2015-11-10 9:28 ` Han, Huaitong
@ 2015-11-10 9:35 ` Paolo Bonzini
0 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2015-11-10 9:35 UTC (permalink / raw)
To: Han, Huaitong; +Cc: gleb@kernel.org, kvm@vger.kernel.org
On 10/11/2015 10:28, Han, Huaitong wrote:
> > pkru = is_long_mode(vcpu) ? read_pkru() : 0;
> > if (unlikely(pkru) && (pfec & PFERR_PK_MASK)) {
> > ... from above ... */
> >
> > /* Flip PFERR_PK_MASK if pkru_bits is non-zero */
> > pfec ^= -pkru_bits & PFERR_PK_MASK;
>
> If pkru_bits is zero, it means dynamically conditions is not met for
> protection-key violations, so pfec on PK bit should be flipped. So I
> guess it should be:
> pfec ^= pkru_bits ? 0 : PFERR_PK_MASK;
Right.
Paolo
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 7/9] KVM, pkeys: Add pkeys support for gva_to_gpa funcions
2015-11-09 11:54 [PATCH 0/9] KVM, pkeys: add memory protection-key support Huaitong Han
` (5 preceding siblings ...)
2015-11-09 11:54 ` [PATCH 6/9] KVM, pkeys: add pkeys support for permission_fault logic Huaitong Han
@ 2015-11-09 11:54 ` Huaitong Han
2015-11-09 13:23 ` Paolo Bonzini
2015-11-09 11:54 ` [PATCH 8/9] KVM, pkeys: add pkeys support for xsave state Huaitong Han
` (2 subsequent siblings)
9 siblings, 1 reply; 18+ messages in thread
From: Huaitong Han @ 2015-11-09 11:54 UTC (permalink / raw)
To: pbonzini, gleb; +Cc: kvm, Huaitong Han
This patch adds pkeys support for gva_to_gpa funcions.
Signed-off-by: Huaitong Han <huaitong.han@intel.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7a84b83..6e9156d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3960,6 +3960,8 @@ gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva,
struct x86_exception *exception)
{
u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
+ access |= is_long_mode(vcpu) && kvm_read_cr4_bits(vcpu, X86_CR4_PKE)
+ ? PFERR_PK_MASK : 0;
return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception);
}
@@ -3968,6 +3970,8 @@ gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva,
{
u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
access |= PFERR_FETCH_MASK;
+ access |= is_long_mode(vcpu) && kvm_read_cr4_bits(vcpu, X86_CR4_PKE)
+ ? PFERR_PK_MASK : 0;
return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception);
}
@@ -3976,6 +3980,8 @@ gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva,
{
u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
access |= PFERR_WRITE_MASK;
+ access |= is_long_mode(vcpu) && kvm_read_cr4_bits(vcpu, X86_CR4_PKE)
+ ? PFERR_PK_MASK : 0;
return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception);
}
@@ -4026,10 +4032,14 @@ static int kvm_fetch_guest_virt(struct x86_emulate_ctxt *ctxt,
u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
unsigned offset;
int ret;
+ gpa_t gpa;
+
+ access |= is_long_mode(vcpu) && kvm_read_cr4_bits(vcpu, X86_CR4_PKE)
+ ? PFERR_PK_MASK : 0;
/* Inline kvm_read_guest_virt_helper for speed. */
- gpa_t gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, addr, access|PFERR_FETCH_MASK,
- exception);
+ gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, addr,
+ access | PFERR_FETCH_MASK, exception);
if (unlikely(gpa == UNMAPPED_GVA))
return X86EMUL_PROPAGATE_FAULT;
@@ -4050,6 +4060,8 @@ int kvm_read_guest_virt(struct x86_emulate_ctxt *ctxt,
{
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
+ access |= is_long_mode(vcpu) && kvm_read_cr4_bits(vcpu, X86_CR4_PKE)
+ ? PFERR_PK_MASK : 0;
return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, access,
exception);
@@ -4073,9 +4085,14 @@ int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt,
void *data = val;
int r = X86EMUL_CONTINUE;
+ u32 access = PFERR_WRITE_MASK;
+
+ access |= is_long_mode(vcpu) && kvm_read_cr4_bits(vcpu, X86_CR4_PKE)
+ ? PFERR_PK_MASK : 0;
+
while (bytes) {
gpa_t gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, addr,
- PFERR_WRITE_MASK,
+ access,
exception);
unsigned offset = addr & (PAGE_SIZE-1);
unsigned towrite = min(bytes, (unsigned)PAGE_SIZE - offset);
--
2.4.3
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 7/9] KVM, pkeys: Add pkeys support for gva_to_gpa funcions
2015-11-09 11:54 ` [PATCH 7/9] KVM, pkeys: Add pkeys support for gva_to_gpa funcions Huaitong Han
@ 2015-11-09 13:23 ` Paolo Bonzini
0 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2015-11-09 13:23 UTC (permalink / raw)
To: Huaitong Han, gleb; +Cc: kvm
On 09/11/2015 12:54, Huaitong Han wrote:
> index 7a84b83..6e9156d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3960,6 +3960,8 @@ gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva,
> struct x86_exception *exception)
> {
> u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
> + access |= is_long_mode(vcpu) && kvm_read_cr4_bits(vcpu, X86_CR4_PKE)
> + ? PFERR_PK_MASK : 0;
> return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception);
I think checking is_long_mode is not necessary here, since is_long_mode
is not checked in update_permission_bitmask but (dynamically) in
permission_fault.
>
> + gpa_t gpa;
> +
> + access |= is_long_mode(vcpu) && kvm_read_cr4_bits(vcpu, X86_CR4_PKE)
> + ? PFERR_PK_MASK : 0;
Fetches never have PFERR_PK_MASK set.
Thanks,
Paolo
> /* Inline kvm_read_guest_virt_helper for speed. */
> - gpa_t gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, addr, access|PFERR_FETCH_MASK,
> - exception);
> + gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, addr,
> + access | PFERR_FETCH_MASK, exception);
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 8/9] KVM, pkeys: add pkeys support for xsave state
2015-11-09 11:54 [PATCH 0/9] KVM, pkeys: add memory protection-key support Huaitong Han
` (6 preceding siblings ...)
2015-11-09 11:54 ` [PATCH 7/9] KVM, pkeys: Add pkeys support for gva_to_gpa funcions Huaitong Han
@ 2015-11-09 11:54 ` Huaitong Han
2015-11-09 11:54 ` [PATCH 9/9] KVM, pkeys: disable PKU feature without ept Huaitong Han
2015-11-09 13:26 ` [PATCH 0/9] KVM, pkeys: add memory protection-key support Paolo Bonzini
9 siblings, 0 replies; 18+ messages in thread
From: Huaitong Han @ 2015-11-09 11:54 UTC (permalink / raw)
To: pbonzini, gleb; +Cc: kvm, Huaitong Han
This patch adds pkeys support for xsave state.
Signed-off-by: Huaitong Han <huaitong.han@intel.com>
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index f2afa5f..0f71d5d 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -182,7 +182,8 @@ bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
#define KVM_SUPPORTED_XCR0 (XFEATURE_MASK_FP | XFEATURE_MASK_SSE \
| XFEATURE_MASK_YMM | XFEATURE_MASK_BNDREGS \
- | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512)
+ | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
+ | XFEATURE_MASK_PKRU)
extern u64 host_xcr0;
extern u64 kvm_supported_xcr0(void);
--
2.4.3
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 9/9] KVM, pkeys: disable PKU feature without ept
2015-11-09 11:54 [PATCH 0/9] KVM, pkeys: add memory protection-key support Huaitong Han
` (7 preceding siblings ...)
2015-11-09 11:54 ` [PATCH 8/9] KVM, pkeys: add pkeys support for xsave state Huaitong Han
@ 2015-11-09 11:54 ` Huaitong Han
2015-11-09 13:26 ` [PATCH 0/9] KVM, pkeys: add memory protection-key support Paolo Bonzini
9 siblings, 0 replies; 18+ messages in thread
From: Huaitong Han @ 2015-11-09 11:54 UTC (permalink / raw)
To: pbonzini, gleb; +Cc: kvm, Huaitong Han
This patch disables CPUID:PKU without ept.
Signed-off-by: Huaitong Han <huaitong.han@intel.com>
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index ece687b..e1113ae 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -447,6 +447,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
entry->ebx |= F(TSC_ADJUST);
entry->ecx &= kvm_supported_word11_x86_features;
cpuid_mask(&entry->ecx, 13);
+ if (!tdp_enabled)
+ entry->ecx &= ~F(PKU);
} else {
entry->ebx = 0;
entry->ecx = 0;
--
2.4.3
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 0/9] KVM, pkeys: add memory protection-key support
2015-11-09 11:54 [PATCH 0/9] KVM, pkeys: add memory protection-key support Huaitong Han
` (8 preceding siblings ...)
2015-11-09 11:54 ` [PATCH 9/9] KVM, pkeys: disable PKU feature without ept Huaitong Han
@ 2015-11-09 13:26 ` Paolo Bonzini
9 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2015-11-09 13:26 UTC (permalink / raw)
To: Huaitong Han, gleb; +Cc: kvm
On 09/11/2015 12:54, Huaitong Han wrote:
> The protection-key feature provides an additional mechanism by which IA-32e
> paging controls access to usermode addresses.
>
> Hardware support for protection keys for user pages is enumerated with CPUID
> feature flag CPUID.7.0.ECX[3]:PKU. Software support is CPUID.7.0.ECX[4]:OSPKE
> with the setting of CR4.PKE(bit 22).
>
> When CR4.PKE = 1, every linear address is associated with the 4-bit protection
> key located in bits 62:59 of the paging-structure entry that mapped the page
> containing the linear address. The PKRU register determines, for each
> protection key, whether user-mode addresses with that protection key may be
> read or written.
>
> The PKRU register (protection key rights for user pages) is a 32-bit register
> with the following format: for each i (0 ≤ i ≤ 15), PKRU[2i] is the
> access-disable bit for protection key i (ADi); PKRU[2i+1] is the write-disable
> bit for protection key i (WDi).
>
> Software can use the RDPKRU and WRPKRU instructions with ECX = 0 to read and
> write PKRU. In addition, the PKRU register is XSAVE-managed state and can thus
> be read and written by instructions in the XSAVE feature set.
Hi, this looks more or less okay. I made a few comments on the
individual patches.
Please add a test for PKRU to kvm-unit-tests' access.c. I will _not_
merge this feature without unit tests. I have merged nested VPID
without, and it was a mistake because they were never submitted and
probably never will.
Thanks,
Paolo
^ permalink raw reply [flat|nested] 18+ messages in thread