* [patch 0/3] [RFC] support read-only mappings
@ 2010-10-19 16:26 Marcelo Tosatti
2010-10-19 16:26 ` [patch 1/3] KVM: VMX: remove setting of shadow_base_ptes for EPT Marcelo Tosatti
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Marcelo Tosatti @ 2010-10-19 16:26 UTC (permalink / raw)
To: kvm; +Cc: avi, aarcange
^ permalink raw reply [flat|nested] 12+ messages in thread
* [patch 1/3] KVM: VMX: remove setting of shadow_base_ptes for EPT
2010-10-19 16:26 [patch 0/3] [RFC] support read-only mappings Marcelo Tosatti
@ 2010-10-19 16:26 ` Marcelo Tosatti
2010-10-20 10:24 ` Avi Kivity
2010-10-19 16:26 ` [patch 2/3] KVM: MMU: flush TLBs on writable -> read-only spte overwrite Marcelo Tosatti
` (2 subsequent siblings)
3 siblings, 1 reply; 12+ messages in thread
From: Marcelo Tosatti @ 2010-10-19 16:26 UTC (permalink / raw)
To: kvm; +Cc: avi, aarcange, Marcelo Tosatti
[-- Attachment #1: vmx-remove-base-ptes --]
[-- Type: text/plain, Size: 1145 bytes --]
The EPT present/writable bits use the same position as normal
pagetable bits.
Since direct_map passes ACC_ALL to mmu_set_spte, thus always setting
the writable bit on sptes, use the generic PT_PRESENT shadow_base_pte.
Also pass present/writable error code information from EPT violation
to generic pagefault handler.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Index: kvm/arch/x86/kvm/vmx.c
===================================================================
--- kvm.orig/arch/x86/kvm/vmx.c
+++ kvm/arch/x86/kvm/vmx.c
@@ -3483,7 +3483,7 @@ static int handle_ept_violation(struct k
gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
trace_kvm_page_fault(gpa, exit_qualification);
- return kvm_mmu_page_fault(vcpu, gpa & PAGE_MASK, 0);
+ return kvm_mmu_page_fault(vcpu, gpa, exit_qualification & 0x3);
}
static u64 ept_rsvd_mask(u64 spte, int level)
@@ -4408,8 +4408,6 @@ static int __init vmx_init(void)
if (enable_ept) {
bypass_guest_pf = 0;
- kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK |
- VMX_EPT_WRITABLE_MASK);
kvm_mmu_set_mask_ptes(0ull, 0ull, 0ull, 0ull,
VMX_EPT_EXECUTABLE_MASK);
kvm_enable_tdp();
^ permalink raw reply [flat|nested] 12+ messages in thread
* [patch 2/3] KVM: MMU: flush TLBs on writable -> read-only spte overwrite
2010-10-19 16:26 [patch 0/3] [RFC] support read-only mappings Marcelo Tosatti
2010-10-19 16:26 ` [patch 1/3] KVM: VMX: remove setting of shadow_base_ptes for EPT Marcelo Tosatti
@ 2010-10-19 16:26 ` Marcelo Tosatti
2010-10-19 16:26 ` [patch 3/3] KVM: propagate fault r/w information to gup(), allow read-only mappings Marcelo Tosatti
2010-10-22 16:18 ` [patch 0/4] support read-only mappings (v2) Marcelo Tosatti
3 siblings, 0 replies; 12+ messages in thread
From: Marcelo Tosatti @ 2010-10-19 16:26 UTC (permalink / raw)
To: kvm; +Cc: avi, aarcange, Marcelo Tosatti
[-- Attachment #1: mmu-flush-tlb-on-overwrite --]
[-- Type: text/plain, Size: 1074 bytes --]
This can happen in the following scenario:
vcpu0 vcpu1
read fault
gup(.write=0)
gup(.write=1)
reuse swap cache, no COW
set writable spte
use writable spte
set read-only spte
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -2075,6 +2075,16 @@ static void mmu_set_spte(struct kvm_vcpu
spte_to_pfn(*sptep), pfn);
drop_spte(vcpu->kvm, sptep, shadow_trap_nonpresent_pte);
kvm_flush_remote_tlbs(vcpu->kvm);
+ /*
+ * If we overwrite a writable spte with a read-only one,
+ * drop it and flush remote TLBs. Otherwise rmap_write_protect
+ * will find a read-only spte, even though the writable spte
+ * might be cached on a CPU's TLB.
+ */
+ } else if (is_writable_pte(*sptep) &&
+ (!(pte_access & ACC_WRITE_MASK) || !dirty)) {
+ drop_spte(vcpu->kvm, sptep, shadow_trap_nonpresent_pte);
+ kvm_flush_remote_tlbs(vcpu->kvm);
} else
was_rmapped = 1;
}
^ permalink raw reply [flat|nested] 12+ messages in thread
* [patch 3/3] KVM: propagate fault r/w information to gup(), allow read-only mappings
2010-10-19 16:26 [patch 0/3] [RFC] support read-only mappings Marcelo Tosatti
2010-10-19 16:26 ` [patch 1/3] KVM: VMX: remove setting of shadow_base_ptes for EPT Marcelo Tosatti
2010-10-19 16:26 ` [patch 2/3] KVM: MMU: flush TLBs on writable -> read-only spte overwrite Marcelo Tosatti
@ 2010-10-19 16:26 ` Marcelo Tosatti
2010-10-20 10:36 ` Avi Kivity
2010-10-22 16:18 ` [patch 0/4] support read-only mappings (v2) Marcelo Tosatti
3 siblings, 1 reply; 12+ messages in thread
From: Marcelo Tosatti @ 2010-10-19 16:26 UTC (permalink / raw)
To: kvm; +Cc: avi, aarcange, Marcelo Tosatti
[-- Attachment #1: gfn-to-pfn-read-only --]
[-- Type: text/plain, Size: 10305 bytes --]
As suggested by Andrea, pass r/w error code to gup(), upgrading read fault
to writable if host pte allows it.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -2081,8 +2081,7 @@ static void mmu_set_spte(struct kvm_vcpu
* will find a read-only spte, even though the writable spte
* might be cached on a CPU's TLB.
*/
- } else if (is_writable_pte(*sptep) &&
- (!(pte_access & ACC_WRITE_MASK) || !dirty)) {
+ } else if (is_writable_pte(*sptep) && !dirty) {
drop_spte(vcpu->kvm, sptep, shadow_trap_nonpresent_pte);
kvm_flush_remote_tlbs(vcpu->kvm);
} else
@@ -2222,7 +2221,7 @@ static void direct_pte_prefetch(struct k
}
static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
- int level, gfn_t gfn, pfn_t pfn)
+ int map_writable, int level, gfn_t gfn, pfn_t pfn)
{
struct kvm_shadow_walk_iterator iterator;
struct kvm_mmu_page *sp;
@@ -2231,9 +2230,13 @@ static int __direct_map(struct kvm_vcpu
for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
if (iterator.level == level) {
- mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, ACC_ALL,
+ unsigned pte_access = ACC_ALL;
+
+ if (!map_writable)
+ pte_access &= ~ACC_WRITE_MASK;
+ mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, pte_access,
0, write, 1, &pt_write,
- level, gfn, pfn, false, true);
+ level, gfn, pfn, false, map_writable);
direct_pte_prefetch(vcpu, iterator.sptep);
++vcpu->stat.pf_fixed;
break;
@@ -2294,6 +2297,7 @@ static int nonpaging_map(struct kvm_vcpu
int level;
pfn_t pfn;
unsigned long mmu_seq;
+ bool map_writable;
level = mapping_level(vcpu, gfn);
@@ -2308,7 +2312,7 @@ static int nonpaging_map(struct kvm_vcpu
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
- pfn = gfn_to_pfn(vcpu->kvm, gfn);
+ pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write, &map_writable);
/* mmio */
if (is_error_pfn(pfn))
@@ -2318,7 +2322,7 @@ static int nonpaging_map(struct kvm_vcpu
if (mmu_notifier_retry(vcpu, mmu_seq))
goto out_unlock;
kvm_mmu_free_some_pages(vcpu);
- r = __direct_map(vcpu, v, write, level, gfn, pfn);
+ r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn);
spin_unlock(&vcpu->kvm->mmu_lock);
@@ -2616,11 +2620,11 @@ static bool can_do_async_pf(struct kvm_v
}
static bool try_async_pf(struct kvm_vcpu *vcpu, bool no_apf, gfn_t gfn,
- gva_t gva, pfn_t *pfn)
+ gva_t gva, pfn_t *pfn, bool write, bool *writable)
{
bool async;
- *pfn = gfn_to_pfn_async(vcpu->kvm, gfn, &async);
+ *pfn = gfn_to_pfn_async(vcpu->kvm, gfn, &async, write, writable);
if (!async)
return false; /* *pfn has correct page already */
@@ -2637,7 +2641,7 @@ static bool try_async_pf(struct kvm_vcpu
return true;
}
- *pfn = gfn_to_pfn(vcpu->kvm, gfn);
+ *pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write, writable);
return false;
}
@@ -2650,6 +2654,8 @@ static int tdp_page_fault(struct kvm_vcp
int level;
gfn_t gfn = gpa >> PAGE_SHIFT;
unsigned long mmu_seq;
+ int write = error_code & PFERR_WRITE_MASK;
+ bool map_writable;
ASSERT(vcpu);
ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
@@ -2665,7 +2671,7 @@ static int tdp_page_fault(struct kvm_vcp
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
- if (try_async_pf(vcpu, no_apf, gfn, gpa, &pfn))
+ if (try_async_pf(vcpu, no_apf, gfn, gpa, &pfn, write, &map_writable))
return 0;
/* mmio */
@@ -2675,7 +2681,7 @@ static int tdp_page_fault(struct kvm_vcp
if (mmu_notifier_retry(vcpu, mmu_seq))
goto out_unlock;
kvm_mmu_free_some_pages(vcpu);
- r = __direct_map(vcpu, gpa, error_code & PFERR_WRITE_MASK,
+ r = __direct_map(vcpu, gpa, write, map_writable,
level, gfn, pfn);
spin_unlock(&vcpu->kvm->mmu_lock);
Index: kvm/arch/x86/kvm/paging_tmpl.h
===================================================================
--- kvm.orig/arch/x86/kvm/paging_tmpl.h
+++ kvm/arch/x86/kvm/paging_tmpl.h
@@ -427,7 +427,7 @@ static void FNAME(pte_prefetch)(struct k
static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
struct guest_walker *gw,
int user_fault, int write_fault, int hlevel,
- int *ptwrite, pfn_t pfn)
+ int *ptwrite, pfn_t pfn, bool map_writable)
{
unsigned access = gw->pt_access;
struct kvm_mmu_page *sp = NULL;
@@ -501,7 +501,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu
mmu_set_spte(vcpu, it.sptep, access, gw->pte_access & access,
user_fault, write_fault, dirty, ptwrite, it.level,
- gw->gfn, pfn, false, true);
+ gw->gfn, pfn, false, map_writable);
FNAME(pte_prefetch)(vcpu, gw, it.sptep);
return it.sptep;
@@ -539,6 +539,7 @@ static int FNAME(page_fault)(struct kvm_
pfn_t pfn;
int level = PT_PAGE_TABLE_LEVEL;
unsigned long mmu_seq;
+ bool map_writable;
pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
@@ -569,13 +570,17 @@ static int FNAME(page_fault)(struct kvm_
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
- if (try_async_pf(vcpu, no_apf, walker.gfn, addr, &pfn))
+ if (try_async_pf(vcpu, no_apf, walker.gfn, addr, &pfn, write_fault,
+ &map_writable))
return 0;
/* mmio */
if (is_error_pfn(pfn))
return kvm_handle_bad_page(vcpu->kvm, walker.gfn, pfn);
+ if (!map_writable)
+ walker.pte_access &= ~ACC_WRITE_MASK;
+
spin_lock(&vcpu->kvm->mmu_lock);
if (mmu_notifier_retry(vcpu, mmu_seq))
goto out_unlock;
@@ -583,7 +588,7 @@ static int FNAME(page_fault)(struct kvm_
trace_kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
kvm_mmu_free_some_pages(vcpu);
sptep = FNAME(fetch)(vcpu, addr, &walker, user_fault, write_fault,
- level, &write_pt, pfn);
+ level, &write_pt, pfn, map_writable);
(void)sptep;
pgprintk("%s: shadow pte %p %llx ptwrite %d\n", __func__,
sptep, *sptep, write_pt);
Index: kvm/include/linux/kvm_host.h
===================================================================
--- kvm.orig/include/linux/kvm_host.h
+++ kvm/include/linux/kvm_host.h
@@ -334,8 +334,11 @@ void kvm_set_page_accessed(struct page *
pfn_t hva_to_pfn_atomic(struct kvm *kvm, unsigned long addr);
pfn_t gfn_to_pfn_atomic(struct kvm *kvm, gfn_t gfn);
-pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async);
+pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async,
+ bool write_fault, bool *writable);
pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
+pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
+ bool *writable);
pfn_t gfn_to_pfn_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot, gfn_t gfn);
int memslot_id(struct kvm *kvm, gfn_t gfn);
Index: kvm/virt/kvm/kvm_main.c
===================================================================
--- kvm.orig/virt/kvm/kvm_main.c
+++ kvm/virt/kvm/kvm_main.c
@@ -955,7 +955,7 @@ unsigned long gfn_to_hva(struct kvm *kvm
EXPORT_SYMBOL_GPL(gfn_to_hva);
static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr, bool atomic,
- bool *async)
+ bool *async, bool write_fault, bool *writable)
{
struct page *page[1];
int npages = 0;
@@ -964,19 +964,42 @@ static pfn_t hva_to_pfn(struct kvm *kvm,
/* we can do it either atomically or asynchronously, not both */
BUG_ON(atomic && async);
+ BUG_ON(!write_fault && !writable);
+
+ if (writable)
+ *writable = true;
+
if (atomic || async)
npages = __get_user_pages_fast(addr, 1, 1, page);
if (unlikely(npages != 1) && !atomic) {
might_sleep();
+ if (writable)
+ *writable = write_fault;
+
if (async) {
down_read(¤t->mm->mmap_sem);
npages = get_user_pages_noio(current, current->mm,
- addr, 1, 1, 0, page, NULL);
+ addr, 1, write_fault, 0,
+ page, NULL);
up_read(¤t->mm->mmap_sem);
} else
- npages = get_user_pages_fast(addr, 1, 1, page);
+ npages = get_user_pages_fast(addr, 1, write_fault,
+ page);
+
+ /* map read fault as writable if possible */
+ if (unlikely(!write_fault) && npages == 1) {
+ struct page *wpage[1];
+
+ npages = __get_user_pages_fast(addr, 1, 1, wpage);
+ if (npages == 1) {
+ *writable = true;
+ put_page(page[0]);
+ page[0] = wpage[0];
+ }
+ npages = 1;
+ }
}
if (unlikely(npages != 1)) {
@@ -1016,11 +1039,12 @@ return_fault_page:
pfn_t hva_to_pfn_atomic(struct kvm *kvm, unsigned long addr)
{
- return hva_to_pfn(kvm, addr, true, NULL);
+ return hva_to_pfn(kvm, addr, true, NULL, true, NULL);
}
EXPORT_SYMBOL_GPL(hva_to_pfn_atomic);
-static pfn_t __gfn_to_pfn(struct kvm *kvm, gfn_t gfn, bool atomic, bool *async)
+static pfn_t __gfn_to_pfn(struct kvm *kvm, gfn_t gfn, bool atomic, bool *async,
+ bool write_fault, bool *writable)
{
unsigned long addr;
@@ -1033,32 +1057,40 @@ static pfn_t __gfn_to_pfn(struct kvm *kv
return page_to_pfn(bad_page);
}
- return hva_to_pfn(kvm, addr, atomic, async);
+ return hva_to_pfn(kvm, addr, atomic, async, write_fault, writable);
}
pfn_t gfn_to_pfn_atomic(struct kvm *kvm, gfn_t gfn)
{
- return __gfn_to_pfn(kvm, gfn, true, NULL);
+ return __gfn_to_pfn(kvm, gfn, true, NULL, true, NULL);
}
EXPORT_SYMBOL_GPL(gfn_to_pfn_atomic);
-pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async)
+pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async,
+ bool write_fault, bool *writable)
{
- return __gfn_to_pfn(kvm, gfn, false, async);
+ return __gfn_to_pfn(kvm, gfn, false, async, write_fault, writable);
}
EXPORT_SYMBOL_GPL(gfn_to_pfn_async);
pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
{
- return __gfn_to_pfn(kvm, gfn, false, NULL);
+ return __gfn_to_pfn(kvm, gfn, false, NULL, true, NULL);
}
EXPORT_SYMBOL_GPL(gfn_to_pfn);
+pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
+ bool *writable)
+{
+ return __gfn_to_pfn(kvm, gfn, false, NULL, write_fault, writable);
+}
+EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
+
pfn_t gfn_to_pfn_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot, gfn_t gfn)
{
unsigned long addr = gfn_to_hva_memslot(slot, gfn);
- return hva_to_pfn(kvm, addr, false, NULL);
+ return hva_to_pfn(kvm, addr, false, NULL, true, NULL);
}
int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, struct page **pages,
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch 1/3] KVM: VMX: remove setting of shadow_base_ptes for EPT
2010-10-19 16:26 ` [patch 1/3] KVM: VMX: remove setting of shadow_base_ptes for EPT Marcelo Tosatti
@ 2010-10-20 10:24 ` Avi Kivity
0 siblings, 0 replies; 12+ messages in thread
From: Avi Kivity @ 2010-10-20 10:24 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: kvm, aarcange
On 10/19/2010 06:26 PM, Marcelo Tosatti wrote:
> The EPT present/writable bits use the same position as normal
> pagetable bits.
>
> Since direct_map passes ACC_ALL to mmu_set_spte, thus always setting
> the writable bit on sptes, use the generic PT_PRESENT shadow_base_pte.
>
> Also pass present/writable error code information from EPT violation
> to generic pagefault handler.
>
> Signed-off-by: Marcelo Tosatti<mtosatti@redhat.com>
>
> Index: kvm/arch/x86/kvm/vmx.c
> ===================================================================
> --- kvm.orig/arch/x86/kvm/vmx.c
> +++ kvm/arch/x86/kvm/vmx.c
> @@ -3483,7 +3483,7 @@ static int handle_ept_violation(struct k
>
> gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> trace_kvm_page_fault(gpa, exit_qualification);
> - return kvm_mmu_page_fault(vcpu, gpa& PAGE_MASK, 0);
> + return kvm_mmu_page_fault(vcpu, gpa, exit_qualification& 0x3);
> }
Why in the same patch? Seems unrelated.
Ah, it's actually not unrelated, it won't work without it.
>
> static u64 ept_rsvd_mask(u64 spte, int level)
> @@ -4408,8 +4408,6 @@ static int __init vmx_init(void)
>
> if (enable_ept) {
> bypass_guest_pf = 0;
> - kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK |
> - VMX_EPT_WRITABLE_MASK);
> kvm_mmu_set_mask_ptes(0ull, 0ull, 0ull, 0ull,
> VMX_EPT_EXECUTABLE_MASK);
> kvm_enable_tdp();
>
Only caller gone, please remove callee.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch 3/3] KVM: propagate fault r/w information to gup(), allow read-only mappings
2010-10-19 16:26 ` [patch 3/3] KVM: propagate fault r/w information to gup(), allow read-only mappings Marcelo Tosatti
@ 2010-10-20 10:36 ` Avi Kivity
0 siblings, 0 replies; 12+ messages in thread
From: Avi Kivity @ 2010-10-20 10:36 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: kvm, aarcange
On 10/19/2010 06:26 PM, Marcelo Tosatti wrote:
> As suggested by Andrea, pass r/w error code to gup(), upgrading read fault
> to writable if host pte allows it.
>
Looks good.
>
> static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr, bool atomic,
> - bool *async)
> + bool *async, bool write_fault, bool *writable)
These sort of changes make reading the callers impossible, lots of
true/false and NULL parameters which we have no idea what they mean.
We should (later) change the functions to accept a structure with named
fields for these parameters instead.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 12+ messages in thread
* [patch 0/4] support read-only mappings (v2)
2010-10-19 16:26 [patch 0/3] [RFC] support read-only mappings Marcelo Tosatti
` (2 preceding siblings ...)
2010-10-19 16:26 ` [patch 3/3] KVM: propagate fault r/w information to gup(), allow read-only mappings Marcelo Tosatti
@ 2010-10-22 16:18 ` Marcelo Tosatti
2010-10-22 16:18 ` [patch 1/4] KVM: VMX: remove setting of shadow_base_ptes for EPT Marcelo Tosatti
` (4 more replies)
3 siblings, 5 replies; 12+ messages in thread
From: Marcelo Tosatti @ 2010-10-22 16:18 UTC (permalink / raw)
To: kvm; +Cc: avi, aarcange
v2:
- remove kvm_mmu_set_base_ptes
- fix mmu_set_spte overwrite thinko
^ permalink raw reply [flat|nested] 12+ messages in thread
* [patch 1/4] KVM: VMX: remove setting of shadow_base_ptes for EPT
2010-10-22 16:18 ` [patch 0/4] support read-only mappings (v2) Marcelo Tosatti
@ 2010-10-22 16:18 ` Marcelo Tosatti
2010-10-22 16:18 ` [patch 2/4] KVM: MMU: remove kvm_mmu_set_base_ptes Marcelo Tosatti
` (3 subsequent siblings)
4 siblings, 0 replies; 12+ messages in thread
From: Marcelo Tosatti @ 2010-10-22 16:18 UTC (permalink / raw)
To: kvm; +Cc: avi, aarcange, Marcelo Tosatti
[-- Attachment #1: vmx-remove-base-ptes --]
[-- Type: text/plain, Size: 1145 bytes --]
The EPT present/writable bits use the same position as normal
pagetable bits.
Since direct_map passes ACC_ALL to mmu_set_spte, thus always setting
the writable bit on sptes, use the generic PT_PRESENT shadow_base_pte.
Also pass present/writable error code information from EPT violation
to generic pagefault handler.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Index: kvm/arch/x86/kvm/vmx.c
===================================================================
--- kvm.orig/arch/x86/kvm/vmx.c
+++ kvm/arch/x86/kvm/vmx.c
@@ -3477,7 +3477,7 @@ static int handle_ept_violation(struct k
gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
trace_kvm_page_fault(gpa, exit_qualification);
- return kvm_mmu_page_fault(vcpu, gpa & PAGE_MASK, 0);
+ return kvm_mmu_page_fault(vcpu, gpa, exit_qualification & 0x3);
}
static u64 ept_rsvd_mask(u64 spte, int level)
@@ -4415,8 +4415,6 @@ static int __init vmx_init(void)
if (enable_ept) {
bypass_guest_pf = 0;
- kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK |
- VMX_EPT_WRITABLE_MASK);
kvm_mmu_set_mask_ptes(0ull, 0ull, 0ull, 0ull,
VMX_EPT_EXECUTABLE_MASK);
kvm_enable_tdp();
^ permalink raw reply [flat|nested] 12+ messages in thread
* [patch 2/4] KVM: MMU: remove kvm_mmu_set_base_ptes
2010-10-22 16:18 ` [patch 0/4] support read-only mappings (v2) Marcelo Tosatti
2010-10-22 16:18 ` [patch 1/4] KVM: VMX: remove setting of shadow_base_ptes for EPT Marcelo Tosatti
@ 2010-10-22 16:18 ` Marcelo Tosatti
2010-10-22 16:18 ` [patch 3/4] KVM: MMU: flush TLBs on writable -> read-only spte overwrite Marcelo Tosatti
` (2 subsequent siblings)
4 siblings, 0 replies; 12+ messages in thread
From: Marcelo Tosatti @ 2010-10-22 16:18 UTC (permalink / raw)
To: kvm; +Cc: avi, aarcange, Marcelo Tosatti
[-- Attachment #1: vmx-remove-set-base-ptes --]
[-- Type: text/plain, Size: 1661 bytes --]
Unused.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -196,7 +196,6 @@ static struct percpu_counter kvm_total_u
static u64 __read_mostly shadow_trap_nonpresent_pte;
static u64 __read_mostly shadow_notrap_nonpresent_pte;
-static u64 __read_mostly shadow_base_present_pte;
static u64 __read_mostly shadow_nx_mask;
static u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */
static u64 __read_mostly shadow_user_mask;
@@ -215,12 +214,6 @@ void kvm_mmu_set_nonpresent_ptes(u64 tra
}
EXPORT_SYMBOL_GPL(kvm_mmu_set_nonpresent_ptes);
-void kvm_mmu_set_base_ptes(u64 base_pte)
-{
- shadow_base_present_pte = base_pte;
-}
-EXPORT_SYMBOL_GPL(kvm_mmu_set_base_ptes);
-
void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
u64 dirty_mask, u64 nx_mask, u64 x_mask)
{
@@ -1974,7 +1967,7 @@ static int set_spte(struct kvm_vcpu *vcp
* whether the guest actually used the pte (in order to detect
* demand paging).
*/
- spte = shadow_base_present_pte;
+ spte = PT_PRESENT_MASK;
if (!speculative)
spte |= shadow_accessed_mask;
if (!dirty)
Index: kvm/arch/x86/kvm/x86.c
===================================================================
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -4688,7 +4688,6 @@ int kvm_arch_init(void *opaque)
kvm_x86_ops = ops;
kvm_mmu_set_nonpresent_ptes(0ull, 0ull);
- kvm_mmu_set_base_ptes(PT_PRESENT_MASK);
kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
PT_DIRTY_MASK, PT64_NX_MASK, 0);
^ permalink raw reply [flat|nested] 12+ messages in thread
* [patch 3/4] KVM: MMU: flush TLBs on writable -> read-only spte overwrite
2010-10-22 16:18 ` [patch 0/4] support read-only mappings (v2) Marcelo Tosatti
2010-10-22 16:18 ` [patch 1/4] KVM: VMX: remove setting of shadow_base_ptes for EPT Marcelo Tosatti
2010-10-22 16:18 ` [patch 2/4] KVM: MMU: remove kvm_mmu_set_base_ptes Marcelo Tosatti
@ 2010-10-22 16:18 ` Marcelo Tosatti
2010-10-22 16:18 ` [patch 4/4] KVM: propagate fault r/w information to gup(), allow read-only memory Marcelo Tosatti
2010-10-27 9:20 ` [patch 0/4] support read-only mappings (v2) Avi Kivity
4 siblings, 0 replies; 12+ messages in thread
From: Marcelo Tosatti @ 2010-10-22 16:18 UTC (permalink / raw)
To: kvm; +Cc: avi, aarcange, Marcelo Tosatti
[-- Attachment #1: mmu-flush-tlb-on-overwrite --]
[-- Type: text/plain, Size: 1074 bytes --]
This can happen in the following scenario:
vcpu0 vcpu1
read fault
gup(.write=0)
gup(.write=1)
reuse swap cache, no COW
set writable spte
use writable spte
set read-only spte
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -2068,6 +2068,16 @@ static void mmu_set_spte(struct kvm_vcpu
spte_to_pfn(*sptep), pfn);
drop_spte(vcpu->kvm, sptep, shadow_trap_nonpresent_pte);
kvm_flush_remote_tlbs(vcpu->kvm);
+ /*
+ * If we overwrite a writable spte with a read-only one,
+ * drop it and flush remote TLBs. Otherwise rmap_write_protect
+ * will find a read-only spte, even though the writable spte
+ * might be cached on a CPU's TLB.
+ */
+ } else if (is_writable_pte(*sptep) &&
+ (!(pte_access & ACC_WRITE_MASK) || !dirty)) {
+ drop_spte(vcpu->kvm, sptep, shadow_trap_nonpresent_pte);
+ kvm_flush_remote_tlbs(vcpu->kvm);
} else
was_rmapped = 1;
}
^ permalink raw reply [flat|nested] 12+ messages in thread
* [patch 4/4] KVM: propagate fault r/w information to gup(), allow read-only memory
2010-10-22 16:18 ` [patch 0/4] support read-only mappings (v2) Marcelo Tosatti
` (2 preceding siblings ...)
2010-10-22 16:18 ` [patch 3/4] KVM: MMU: flush TLBs on writable -> read-only spte overwrite Marcelo Tosatti
@ 2010-10-22 16:18 ` Marcelo Tosatti
2010-10-27 9:20 ` [patch 0/4] support read-only mappings (v2) Avi Kivity
4 siblings, 0 replies; 12+ messages in thread
From: Marcelo Tosatti @ 2010-10-22 16:18 UTC (permalink / raw)
To: kvm; +Cc: avi, aarcange, Marcelo Tosatti
[-- Attachment #1: gfn-to-pfn-read-only --]
[-- Type: text/plain, Size: 9862 bytes --]
As suggested by Andrea, pass r/w error code to gup(), upgrading read fault
to writable if host pte allows it.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -2215,7 +2215,7 @@ static void direct_pte_prefetch(struct k
}
static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
- int level, gfn_t gfn, pfn_t pfn)
+ int map_writable, int level, gfn_t gfn, pfn_t pfn)
{
struct kvm_shadow_walk_iterator iterator;
struct kvm_mmu_page *sp;
@@ -2224,9 +2224,13 @@ static int __direct_map(struct kvm_vcpu
for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
if (iterator.level == level) {
- mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, ACC_ALL,
+ unsigned pte_access = ACC_ALL;
+
+ if (!map_writable)
+ pte_access &= ~ACC_WRITE_MASK;
+ mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, pte_access,
0, write, 1, &pt_write,
- level, gfn, pfn, false, true);
+ level, gfn, pfn, false, map_writable);
direct_pte_prefetch(vcpu, iterator.sptep);
++vcpu->stat.pf_fixed;
break;
@@ -2287,6 +2291,7 @@ static int nonpaging_map(struct kvm_vcpu
int level;
pfn_t pfn;
unsigned long mmu_seq;
+ bool map_writable;
level = mapping_level(vcpu, gfn);
@@ -2301,7 +2306,7 @@ static int nonpaging_map(struct kvm_vcpu
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
- pfn = gfn_to_pfn(vcpu->kvm, gfn);
+ pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write, &map_writable);
/* mmio */
if (is_error_pfn(pfn))
@@ -2311,7 +2316,7 @@ static int nonpaging_map(struct kvm_vcpu
if (mmu_notifier_retry(vcpu, mmu_seq))
goto out_unlock;
kvm_mmu_free_some_pages(vcpu);
- r = __direct_map(vcpu, v, write, level, gfn, pfn);
+ r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn);
spin_unlock(&vcpu->kvm->mmu_lock);
@@ -2609,11 +2614,11 @@ static bool can_do_async_pf(struct kvm_v
}
static bool try_async_pf(struct kvm_vcpu *vcpu, bool no_apf, gfn_t gfn,
- gva_t gva, pfn_t *pfn)
+ gva_t gva, pfn_t *pfn, bool write, bool *writable)
{
bool async;
- *pfn = gfn_to_pfn_async(vcpu->kvm, gfn, &async);
+ *pfn = gfn_to_pfn_async(vcpu->kvm, gfn, &async, write, writable);
if (!async)
return false; /* *pfn has correct page already */
@@ -2630,7 +2635,7 @@ static bool try_async_pf(struct kvm_vcpu
return true;
}
- *pfn = gfn_to_pfn(vcpu->kvm, gfn);
+ *pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write, writable);
return false;
}
@@ -2643,6 +2648,8 @@ static int tdp_page_fault(struct kvm_vcp
int level;
gfn_t gfn = gpa >> PAGE_SHIFT;
unsigned long mmu_seq;
+ int write = error_code & PFERR_WRITE_MASK;
+ bool map_writable;
ASSERT(vcpu);
ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
@@ -2658,7 +2665,7 @@ static int tdp_page_fault(struct kvm_vcp
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
- if (try_async_pf(vcpu, no_apf, gfn, gpa, &pfn))
+ if (try_async_pf(vcpu, no_apf, gfn, gpa, &pfn, write, &map_writable))
return 0;
/* mmio */
@@ -2668,7 +2675,7 @@ static int tdp_page_fault(struct kvm_vcp
if (mmu_notifier_retry(vcpu, mmu_seq))
goto out_unlock;
kvm_mmu_free_some_pages(vcpu);
- r = __direct_map(vcpu, gpa, error_code & PFERR_WRITE_MASK,
+ r = __direct_map(vcpu, gpa, write, map_writable,
level, gfn, pfn);
spin_unlock(&vcpu->kvm->mmu_lock);
Index: kvm/arch/x86/kvm/paging_tmpl.h
===================================================================
--- kvm.orig/arch/x86/kvm/paging_tmpl.h
+++ kvm/arch/x86/kvm/paging_tmpl.h
@@ -427,7 +427,7 @@ static void FNAME(pte_prefetch)(struct k
static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
struct guest_walker *gw,
int user_fault, int write_fault, int hlevel,
- int *ptwrite, pfn_t pfn)
+ int *ptwrite, pfn_t pfn, bool map_writable)
{
unsigned access = gw->pt_access;
struct kvm_mmu_page *sp = NULL;
@@ -501,7 +501,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu
mmu_set_spte(vcpu, it.sptep, access, gw->pte_access & access,
user_fault, write_fault, dirty, ptwrite, it.level,
- gw->gfn, pfn, false, true);
+ gw->gfn, pfn, false, map_writable);
FNAME(pte_prefetch)(vcpu, gw, it.sptep);
return it.sptep;
@@ -539,6 +539,7 @@ static int FNAME(page_fault)(struct kvm_
pfn_t pfn;
int level = PT_PAGE_TABLE_LEVEL;
unsigned long mmu_seq;
+ bool map_writable;
pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
@@ -569,13 +570,17 @@ static int FNAME(page_fault)(struct kvm_
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
- if (try_async_pf(vcpu, no_apf, walker.gfn, addr, &pfn))
+ if (try_async_pf(vcpu, no_apf, walker.gfn, addr, &pfn, write_fault,
+ &map_writable))
return 0;
/* mmio */
if (is_error_pfn(pfn))
return kvm_handle_bad_page(vcpu->kvm, walker.gfn, pfn);
+ if (!map_writable)
+ walker.pte_access &= ~ACC_WRITE_MASK;
+
spin_lock(&vcpu->kvm->mmu_lock);
if (mmu_notifier_retry(vcpu, mmu_seq))
goto out_unlock;
@@ -583,7 +588,7 @@ static int FNAME(page_fault)(struct kvm_
trace_kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
kvm_mmu_free_some_pages(vcpu);
sptep = FNAME(fetch)(vcpu, addr, &walker, user_fault, write_fault,
- level, &write_pt, pfn);
+ level, &write_pt, pfn, map_writable);
(void)sptep;
pgprintk("%s: shadow pte %p %llx ptwrite %d\n", __func__,
sptep, *sptep, write_pt);
Index: kvm/include/linux/kvm_host.h
===================================================================
--- kvm.orig/include/linux/kvm_host.h
+++ kvm/include/linux/kvm_host.h
@@ -334,8 +334,11 @@ void kvm_set_page_accessed(struct page *
pfn_t hva_to_pfn_atomic(struct kvm *kvm, unsigned long addr);
pfn_t gfn_to_pfn_atomic(struct kvm *kvm, gfn_t gfn);
-pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async);
+pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async,
+ bool write_fault, bool *writable);
pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
+pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
+ bool *writable);
pfn_t gfn_to_pfn_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot, gfn_t gfn);
int memslot_id(struct kvm *kvm, gfn_t gfn);
Index: kvm/virt/kvm/kvm_main.c
===================================================================
--- kvm.orig/virt/kvm/kvm_main.c
+++ kvm/virt/kvm/kvm_main.c
@@ -959,7 +959,7 @@ static pfn_t get_fault_pfn(void)
}
static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr, bool atomic,
- bool *async)
+ bool *async, bool write_fault, bool *writable)
{
struct page *page[1];
int npages = 0;
@@ -968,19 +968,42 @@ static pfn_t hva_to_pfn(struct kvm *kvm,
/* we can do it either atomically or asynchronously, not both */
BUG_ON(atomic && async);
+ BUG_ON(!write_fault && !writable);
+
+ if (writable)
+ *writable = true;
+
if (atomic || async)
npages = __get_user_pages_fast(addr, 1, 1, page);
if (unlikely(npages != 1) && !atomic) {
might_sleep();
+ if (writable)
+ *writable = write_fault;
+
if (async) {
down_read(¤t->mm->mmap_sem);
npages = get_user_pages_noio(current, current->mm,
- addr, 1, 1, 0, page, NULL);
+ addr, 1, write_fault, 0,
+ page, NULL);
up_read(¤t->mm->mmap_sem);
} else
- npages = get_user_pages_fast(addr, 1, 1, page);
+ npages = get_user_pages_fast(addr, 1, write_fault,
+ page);
+
+ /* map read fault as writable if possible */
+ if (unlikely(!write_fault) && npages == 1) {
+ struct page *wpage[1];
+
+ npages = __get_user_pages_fast(addr, 1, 1, wpage);
+ if (npages == 1) {
+ *writable = true;
+ put_page(page[0]);
+ page[0] = wpage[0];
+ }
+ npages = 1;
+ }
}
if (unlikely(npages != 1)) {
@@ -1018,11 +1041,12 @@ static pfn_t hva_to_pfn(struct kvm *kvm,
pfn_t hva_to_pfn_atomic(struct kvm *kvm, unsigned long addr)
{
- return hva_to_pfn(kvm, addr, true, NULL);
+ return hva_to_pfn(kvm, addr, true, NULL, true, NULL);
}
EXPORT_SYMBOL_GPL(hva_to_pfn_atomic);
-static pfn_t __gfn_to_pfn(struct kvm *kvm, gfn_t gfn, bool atomic, bool *async)
+static pfn_t __gfn_to_pfn(struct kvm *kvm, gfn_t gfn, bool atomic, bool *async,
+ bool write_fault, bool *writable)
{
unsigned long addr;
@@ -1035,32 +1059,40 @@ static pfn_t __gfn_to_pfn(struct kvm *kv
return page_to_pfn(bad_page);
}
- return hva_to_pfn(kvm, addr, atomic, async);
+ return hva_to_pfn(kvm, addr, atomic, async, write_fault, writable);
}
pfn_t gfn_to_pfn_atomic(struct kvm *kvm, gfn_t gfn)
{
- return __gfn_to_pfn(kvm, gfn, true, NULL);
+ return __gfn_to_pfn(kvm, gfn, true, NULL, true, NULL);
}
EXPORT_SYMBOL_GPL(gfn_to_pfn_atomic);
-pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async)
+pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async,
+ bool write_fault, bool *writable)
{
- return __gfn_to_pfn(kvm, gfn, false, async);
+ return __gfn_to_pfn(kvm, gfn, false, async, write_fault, writable);
}
EXPORT_SYMBOL_GPL(gfn_to_pfn_async);
pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
{
- return __gfn_to_pfn(kvm, gfn, false, NULL);
+ return __gfn_to_pfn(kvm, gfn, false, NULL, true, NULL);
}
EXPORT_SYMBOL_GPL(gfn_to_pfn);
+pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
+ bool *writable)
+{
+ return __gfn_to_pfn(kvm, gfn, false, NULL, write_fault, writable);
+}
+EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
+
pfn_t gfn_to_pfn_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot, gfn_t gfn)
{
unsigned long addr = gfn_to_hva_memslot(slot, gfn);
- return hva_to_pfn(kvm, addr, false, NULL);
+ return hva_to_pfn(kvm, addr, false, NULL, true, NULL);
}
int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, struct page **pages,
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch 0/4] support read-only mappings (v2)
2010-10-22 16:18 ` [patch 0/4] support read-only mappings (v2) Marcelo Tosatti
` (3 preceding siblings ...)
2010-10-22 16:18 ` [patch 4/4] KVM: propagate fault r/w information to gup(), allow read-only memory Marcelo Tosatti
@ 2010-10-27 9:20 ` Avi Kivity
4 siblings, 0 replies; 12+ messages in thread
From: Avi Kivity @ 2010-10-27 9:20 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: kvm, aarcange
On 10/22/2010 06:18 PM, Marcelo Tosatti wrote:
> v2:
> - remove kvm_mmu_set_base_ptes
> - fix mmu_set_spte overwrite thinko
>
>
Applied, thanks.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2010-10-27 9:20 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-19 16:26 [patch 0/3] [RFC] support read-only mappings Marcelo Tosatti
2010-10-19 16:26 ` [patch 1/3] KVM: VMX: remove setting of shadow_base_ptes for EPT Marcelo Tosatti
2010-10-20 10:24 ` Avi Kivity
2010-10-19 16:26 ` [patch 2/3] KVM: MMU: flush TLBs on writable -> read-only spte overwrite Marcelo Tosatti
2010-10-19 16:26 ` [patch 3/3] KVM: propagate fault r/w information to gup(), allow read-only mappings Marcelo Tosatti
2010-10-20 10:36 ` Avi Kivity
2010-10-22 16:18 ` [patch 0/4] support read-only mappings (v2) Marcelo Tosatti
2010-10-22 16:18 ` [patch 1/4] KVM: VMX: remove setting of shadow_base_ptes for EPT Marcelo Tosatti
2010-10-22 16:18 ` [patch 2/4] KVM: MMU: remove kvm_mmu_set_base_ptes Marcelo Tosatti
2010-10-22 16:18 ` [patch 3/4] KVM: MMU: flush TLBs on writable -> read-only spte overwrite Marcelo Tosatti
2010-10-22 16:18 ` [patch 4/4] KVM: propagate fault r/w information to gup(), allow read-only memory Marcelo Tosatti
2010-10-27 9:20 ` [patch 0/4] support read-only mappings (v2) Avi Kivity
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox