* [patch 0/3] [RFC] support read-only mappings
@ 2010-10-19 16:26 Marcelo Tosatti
2010-10-19 16:26 ` [patch 1/3] KVM: VMX: remove setting of shadow_base_ptes for EPT Marcelo Tosatti
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Marcelo Tosatti @ 2010-10-19 16:26 UTC (permalink / raw)
To: kvm; +Cc: avi, aarcange
^ permalink raw reply [flat|nested] 12+ messages in thread* [patch 1/3] KVM: VMX: remove setting of shadow_base_ptes for EPT 2010-10-19 16:26 [patch 0/3] [RFC] support read-only mappings Marcelo Tosatti @ 2010-10-19 16:26 ` Marcelo Tosatti 2010-10-20 10:24 ` Avi Kivity 2010-10-19 16:26 ` [patch 2/3] KVM: MMU: flush TLBs on writable -> read-only spte overwrite Marcelo Tosatti ` (2 subsequent siblings) 3 siblings, 1 reply; 12+ messages in thread From: Marcelo Tosatti @ 2010-10-19 16:26 UTC (permalink / raw) To: kvm; +Cc: avi, aarcange, Marcelo Tosatti [-- Attachment #1: vmx-remove-base-ptes --] [-- Type: text/plain, Size: 1145 bytes --] The EPT present/writable bits use the same position as normal pagetable bits. Since direct_map passes ACC_ALL to mmu_set_spte, thus always setting the writable bit on sptes, use the generic PT_PRESENT shadow_base_pte. Also pass present/writable error code information from EPT violation to generic pagefault handler. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Index: kvm/arch/x86/kvm/vmx.c =================================================================== --- kvm.orig/arch/x86/kvm/vmx.c +++ kvm/arch/x86/kvm/vmx.c @@ -3483,7 +3483,7 @@ static int handle_ept_violation(struct k gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); trace_kvm_page_fault(gpa, exit_qualification); - return kvm_mmu_page_fault(vcpu, gpa & PAGE_MASK, 0); + return kvm_mmu_page_fault(vcpu, gpa, exit_qualification & 0x3); } static u64 ept_rsvd_mask(u64 spte, int level) @@ -4408,8 +4408,6 @@ static int __init vmx_init(void) if (enable_ept) { bypass_guest_pf = 0; - kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK | - VMX_EPT_WRITABLE_MASK); kvm_mmu_set_mask_ptes(0ull, 0ull, 0ull, 0ull, VMX_EPT_EXECUTABLE_MASK); kvm_enable_tdp(); ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch 1/3] KVM: VMX: remove setting of shadow_base_ptes for EPT 2010-10-19 16:26 ` [patch 1/3] KVM: VMX: remove setting of shadow_base_ptes for EPT Marcelo Tosatti @ 2010-10-20 10:24 ` Avi Kivity 0 siblings, 0 replies; 12+ messages in thread From: Avi Kivity @ 2010-10-20 10:24 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: kvm, aarcange On 10/19/2010 06:26 PM, Marcelo Tosatti wrote: > The EPT present/writable bits use the same position as normal > pagetable bits. > > Since direct_map passes ACC_ALL to mmu_set_spte, thus always setting > the writable bit on sptes, use the generic PT_PRESENT shadow_base_pte. > > Also pass present/writable error code information from EPT violation > to generic pagefault handler. > > Signed-off-by: Marcelo Tosatti<mtosatti@redhat.com> > > Index: kvm/arch/x86/kvm/vmx.c > =================================================================== > --- kvm.orig/arch/x86/kvm/vmx.c > +++ kvm/arch/x86/kvm/vmx.c > @@ -3483,7 +3483,7 @@ static int handle_ept_violation(struct k > > gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); > trace_kvm_page_fault(gpa, exit_qualification); > - return kvm_mmu_page_fault(vcpu, gpa& PAGE_MASK, 0); > + return kvm_mmu_page_fault(vcpu, gpa, exit_qualification& 0x3); > } Why in the same patch? Seems unrelated. Ah, it's actually not unrelated, it won't work without it. > > static u64 ept_rsvd_mask(u64 spte, int level) > @@ -4408,8 +4408,6 @@ static int __init vmx_init(void) > > if (enable_ept) { > bypass_guest_pf = 0; > - kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK | > - VMX_EPT_WRITABLE_MASK); > kvm_mmu_set_mask_ptes(0ull, 0ull, 0ull, 0ull, > VMX_EPT_EXECUTABLE_MASK); > kvm_enable_tdp(); > Only caller gone, please remove callee. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 12+ messages in thread
* [patch 2/3] KVM: MMU: flush TLBs on writable -> read-only spte overwrite 2010-10-19 16:26 [patch 0/3] [RFC] support read-only mappings Marcelo Tosatti 2010-10-19 16:26 ` [patch 1/3] KVM: VMX: remove setting of shadow_base_ptes for EPT Marcelo Tosatti @ 2010-10-19 16:26 ` Marcelo Tosatti 2010-10-19 16:26 ` [patch 3/3] KVM: propagate fault r/w information to gup(), allow read-only mappings Marcelo Tosatti 2010-10-22 16:18 ` [patch 0/4] support read-only mappings (v2) Marcelo Tosatti 3 siblings, 0 replies; 12+ messages in thread From: Marcelo Tosatti @ 2010-10-19 16:26 UTC (permalink / raw) To: kvm; +Cc: avi, aarcange, Marcelo Tosatti [-- Attachment #1: mmu-flush-tlb-on-overwrite --] [-- Type: text/plain, Size: 1074 bytes --] This can happen in the following scenario: vcpu0 vcpu1 read fault gup(.write=0) gup(.write=1) reuse swap cache, no COW set writable spte use writable spte set read-only spte Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Index: kvm/arch/x86/kvm/mmu.c =================================================================== --- kvm.orig/arch/x86/kvm/mmu.c +++ kvm/arch/x86/kvm/mmu.c @@ -2075,6 +2075,16 @@ static void mmu_set_spte(struct kvm_vcpu spte_to_pfn(*sptep), pfn); drop_spte(vcpu->kvm, sptep, shadow_trap_nonpresent_pte); kvm_flush_remote_tlbs(vcpu->kvm); + /* + * If we overwrite a writable spte with a read-only one, + * drop it and flush remote TLBs. Otherwise rmap_write_protect + * will find a read-only spte, even though the writable spte + * might be cached on a CPU's TLB. + */ + } else if (is_writable_pte(*sptep) && + (!(pte_access & ACC_WRITE_MASK) || !dirty)) { + drop_spte(vcpu->kvm, sptep, shadow_trap_nonpresent_pte); + kvm_flush_remote_tlbs(vcpu->kvm); } else was_rmapped = 1; } ^ permalink raw reply [flat|nested] 12+ messages in thread
* [patch 3/3] KVM: propagate fault r/w information to gup(), allow read-only mappings 2010-10-19 16:26 [patch 0/3] [RFC] support read-only mappings Marcelo Tosatti 2010-10-19 16:26 ` [patch 1/3] KVM: VMX: remove setting of shadow_base_ptes for EPT Marcelo Tosatti 2010-10-19 16:26 ` [patch 2/3] KVM: MMU: flush TLBs on writable -> read-only spte overwrite Marcelo Tosatti @ 2010-10-19 16:26 ` Marcelo Tosatti 2010-10-20 10:36 ` Avi Kivity 2010-10-22 16:18 ` [patch 0/4] support read-only mappings (v2) Marcelo Tosatti 3 siblings, 1 reply; 12+ messages in thread From: Marcelo Tosatti @ 2010-10-19 16:26 UTC (permalink / raw) To: kvm; +Cc: avi, aarcange, Marcelo Tosatti [-- Attachment #1: gfn-to-pfn-read-only --] [-- Type: text/plain, Size: 10305 bytes --] As suggested by Andrea, pass r/w error code to gup(), upgrading read fault to writable if host pte allows it. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Index: kvm/arch/x86/kvm/mmu.c =================================================================== --- kvm.orig/arch/x86/kvm/mmu.c +++ kvm/arch/x86/kvm/mmu.c @@ -2081,8 +2081,7 @@ static void mmu_set_spte(struct kvm_vcpu * will find a read-only spte, even though the writable spte * might be cached on a CPU's TLB. */ - } else if (is_writable_pte(*sptep) && - (!(pte_access & ACC_WRITE_MASK) || !dirty)) { + } else if (is_writable_pte(*sptep) && !dirty) { drop_spte(vcpu->kvm, sptep, shadow_trap_nonpresent_pte); kvm_flush_remote_tlbs(vcpu->kvm); } else @@ -2222,7 +2221,7 @@ static void direct_pte_prefetch(struct k } static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, - int level, gfn_t gfn, pfn_t pfn) + int map_writable, int level, gfn_t gfn, pfn_t pfn) { struct kvm_shadow_walk_iterator iterator; struct kvm_mmu_page *sp; @@ -2231,9 +2230,13 @@ static int __direct_map(struct kvm_vcpu for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) { if (iterator.level == level) { - mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, ACC_ALL, + unsigned pte_access = ACC_ALL; + + if (!map_writable) + pte_access &= ~ACC_WRITE_MASK; + mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, pte_access, 0, write, 1, &pt_write, - level, gfn, pfn, false, true); + level, gfn, pfn, false, map_writable); direct_pte_prefetch(vcpu, iterator.sptep); ++vcpu->stat.pf_fixed; break; @@ -2294,6 +2297,7 @@ static int nonpaging_map(struct kvm_vcpu int level; pfn_t pfn; unsigned long mmu_seq; + bool map_writable; level = mapping_level(vcpu, gfn); @@ -2308,7 +2312,7 @@ static int nonpaging_map(struct kvm_vcpu mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb(); - pfn = gfn_to_pfn(vcpu->kvm, gfn); + pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write, &map_writable); /* mmio */ if (is_error_pfn(pfn)) @@ -2318,7 +2322,7 @@ static int nonpaging_map(struct kvm_vcpu if (mmu_notifier_retry(vcpu, mmu_seq)) goto out_unlock; kvm_mmu_free_some_pages(vcpu); - r = __direct_map(vcpu, v, write, level, gfn, pfn); + r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn); spin_unlock(&vcpu->kvm->mmu_lock); @@ -2616,11 +2620,11 @@ static bool can_do_async_pf(struct kvm_v } static bool try_async_pf(struct kvm_vcpu *vcpu, bool no_apf, gfn_t gfn, - gva_t gva, pfn_t *pfn) + gva_t gva, pfn_t *pfn, bool write, bool *writable) { bool async; - *pfn = gfn_to_pfn_async(vcpu->kvm, gfn, &async); + *pfn = gfn_to_pfn_async(vcpu->kvm, gfn, &async, write, writable); if (!async) return false; /* *pfn has correct page already */ @@ -2637,7 +2641,7 @@ static bool try_async_pf(struct kvm_vcpu return true; } - *pfn = gfn_to_pfn(vcpu->kvm, gfn); + *pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write, writable); return false; } @@ -2650,6 +2654,8 @@ static int tdp_page_fault(struct kvm_vcp int level; gfn_t gfn = gpa >> PAGE_SHIFT; unsigned long mmu_seq; + int write = error_code & PFERR_WRITE_MASK; + bool map_writable; ASSERT(vcpu); ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa)); @@ -2665,7 +2671,7 @@ static int tdp_page_fault(struct kvm_vcp mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb(); - if (try_async_pf(vcpu, no_apf, gfn, gpa, &pfn)) + if (try_async_pf(vcpu, no_apf, gfn, gpa, &pfn, write, &map_writable)) return 0; /* mmio */ @@ -2675,7 +2681,7 @@ static int tdp_page_fault(struct kvm_vcp if (mmu_notifier_retry(vcpu, mmu_seq)) goto out_unlock; kvm_mmu_free_some_pages(vcpu); - r = __direct_map(vcpu, gpa, error_code & PFERR_WRITE_MASK, + r = __direct_map(vcpu, gpa, write, map_writable, level, gfn, pfn); spin_unlock(&vcpu->kvm->mmu_lock); Index: kvm/arch/x86/kvm/paging_tmpl.h =================================================================== --- kvm.orig/arch/x86/kvm/paging_tmpl.h +++ kvm/arch/x86/kvm/paging_tmpl.h @@ -427,7 +427,7 @@ static void FNAME(pte_prefetch)(struct k static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, struct guest_walker *gw, int user_fault, int write_fault, int hlevel, - int *ptwrite, pfn_t pfn) + int *ptwrite, pfn_t pfn, bool map_writable) { unsigned access = gw->pt_access; struct kvm_mmu_page *sp = NULL; @@ -501,7 +501,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu mmu_set_spte(vcpu, it.sptep, access, gw->pte_access & access, user_fault, write_fault, dirty, ptwrite, it.level, - gw->gfn, pfn, false, true); + gw->gfn, pfn, false, map_writable); FNAME(pte_prefetch)(vcpu, gw, it.sptep); return it.sptep; @@ -539,6 +539,7 @@ static int FNAME(page_fault)(struct kvm_ pfn_t pfn; int level = PT_PAGE_TABLE_LEVEL; unsigned long mmu_seq; + bool map_writable; pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code); @@ -569,13 +570,17 @@ static int FNAME(page_fault)(struct kvm_ mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb(); - if (try_async_pf(vcpu, no_apf, walker.gfn, addr, &pfn)) + if (try_async_pf(vcpu, no_apf, walker.gfn, addr, &pfn, write_fault, + &map_writable)) return 0; /* mmio */ if (is_error_pfn(pfn)) return kvm_handle_bad_page(vcpu->kvm, walker.gfn, pfn); + if (!map_writable) + walker.pte_access &= ~ACC_WRITE_MASK; + spin_lock(&vcpu->kvm->mmu_lock); if (mmu_notifier_retry(vcpu, mmu_seq)) goto out_unlock; @@ -583,7 +588,7 @@ static int FNAME(page_fault)(struct kvm_ trace_kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT); kvm_mmu_free_some_pages(vcpu); sptep = FNAME(fetch)(vcpu, addr, &walker, user_fault, write_fault, - level, &write_pt, pfn); + level, &write_pt, pfn, map_writable); (void)sptep; pgprintk("%s: shadow pte %p %llx ptwrite %d\n", __func__, sptep, *sptep, write_pt); Index: kvm/include/linux/kvm_host.h =================================================================== --- kvm.orig/include/linux/kvm_host.h +++ kvm/include/linux/kvm_host.h @@ -334,8 +334,11 @@ void kvm_set_page_accessed(struct page * pfn_t hva_to_pfn_atomic(struct kvm *kvm, unsigned long addr); pfn_t gfn_to_pfn_atomic(struct kvm *kvm, gfn_t gfn); -pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async); +pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async, + bool write_fault, bool *writable); pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn); +pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, + bool *writable); pfn_t gfn_to_pfn_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn); int memslot_id(struct kvm *kvm, gfn_t gfn); Index: kvm/virt/kvm/kvm_main.c =================================================================== --- kvm.orig/virt/kvm/kvm_main.c +++ kvm/virt/kvm/kvm_main.c @@ -955,7 +955,7 @@ unsigned long gfn_to_hva(struct kvm *kvm EXPORT_SYMBOL_GPL(gfn_to_hva); static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr, bool atomic, - bool *async) + bool *async, bool write_fault, bool *writable) { struct page *page[1]; int npages = 0; @@ -964,19 +964,42 @@ static pfn_t hva_to_pfn(struct kvm *kvm, /* we can do it either atomically or asynchronously, not both */ BUG_ON(atomic && async); + BUG_ON(!write_fault && !writable); + + if (writable) + *writable = true; + if (atomic || async) npages = __get_user_pages_fast(addr, 1, 1, page); if (unlikely(npages != 1) && !atomic) { might_sleep(); + if (writable) + *writable = write_fault; + if (async) { down_read(¤t->mm->mmap_sem); npages = get_user_pages_noio(current, current->mm, - addr, 1, 1, 0, page, NULL); + addr, 1, write_fault, 0, + page, NULL); up_read(¤t->mm->mmap_sem); } else - npages = get_user_pages_fast(addr, 1, 1, page); + npages = get_user_pages_fast(addr, 1, write_fault, + page); + + /* map read fault as writable if possible */ + if (unlikely(!write_fault) && npages == 1) { + struct page *wpage[1]; + + npages = __get_user_pages_fast(addr, 1, 1, wpage); + if (npages == 1) { + *writable = true; + put_page(page[0]); + page[0] = wpage[0]; + } + npages = 1; + } } if (unlikely(npages != 1)) { @@ -1016,11 +1039,12 @@ return_fault_page: pfn_t hva_to_pfn_atomic(struct kvm *kvm, unsigned long addr) { - return hva_to_pfn(kvm, addr, true, NULL); + return hva_to_pfn(kvm, addr, true, NULL, true, NULL); } EXPORT_SYMBOL_GPL(hva_to_pfn_atomic); -static pfn_t __gfn_to_pfn(struct kvm *kvm, gfn_t gfn, bool atomic, bool *async) +static pfn_t __gfn_to_pfn(struct kvm *kvm, gfn_t gfn, bool atomic, bool *async, + bool write_fault, bool *writable) { unsigned long addr; @@ -1033,32 +1057,40 @@ static pfn_t __gfn_to_pfn(struct kvm *kv return page_to_pfn(bad_page); } - return hva_to_pfn(kvm, addr, atomic, async); + return hva_to_pfn(kvm, addr, atomic, async, write_fault, writable); } pfn_t gfn_to_pfn_atomic(struct kvm *kvm, gfn_t gfn) { - return __gfn_to_pfn(kvm, gfn, true, NULL); + return __gfn_to_pfn(kvm, gfn, true, NULL, true, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn_atomic); -pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async) +pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async, + bool write_fault, bool *writable) { - return __gfn_to_pfn(kvm, gfn, false, async); + return __gfn_to_pfn(kvm, gfn, false, async, write_fault, writable); } EXPORT_SYMBOL_GPL(gfn_to_pfn_async); pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) { - return __gfn_to_pfn(kvm, gfn, false, NULL); + return __gfn_to_pfn(kvm, gfn, false, NULL, true, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn); +pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, + bool *writable) +{ + return __gfn_to_pfn(kvm, gfn, false, NULL, write_fault, writable); +} +EXPORT_SYMBOL_GPL(gfn_to_pfn_prot); + pfn_t gfn_to_pfn_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn) { unsigned long addr = gfn_to_hva_memslot(slot, gfn); - return hva_to_pfn(kvm, addr, false, NULL); + return hva_to_pfn(kvm, addr, false, NULL, true, NULL); } int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, struct page **pages, ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch 3/3] KVM: propagate fault r/w information to gup(), allow read-only mappings 2010-10-19 16:26 ` [patch 3/3] KVM: propagate fault r/w information to gup(), allow read-only mappings Marcelo Tosatti @ 2010-10-20 10:36 ` Avi Kivity 0 siblings, 0 replies; 12+ messages in thread From: Avi Kivity @ 2010-10-20 10:36 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: kvm, aarcange On 10/19/2010 06:26 PM, Marcelo Tosatti wrote: > As suggested by Andrea, pass r/w error code to gup(), upgrading read fault > to writable if host pte allows it. > Looks good. > > static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr, bool atomic, > - bool *async) > + bool *async, bool write_fault, bool *writable) These sort of changes make reading the callers impossible, lots of true/false and NULL parameters which we have no idea what they mean. We should (later) change the functions to accept a structure with named fields for these parameters instead. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 12+ messages in thread
* [patch 0/4] support read-only mappings (v2) 2010-10-19 16:26 [patch 0/3] [RFC] support read-only mappings Marcelo Tosatti ` (2 preceding siblings ...) 2010-10-19 16:26 ` [patch 3/3] KVM: propagate fault r/w information to gup(), allow read-only mappings Marcelo Tosatti @ 2010-10-22 16:18 ` Marcelo Tosatti 2010-10-22 16:18 ` [patch 1/4] KVM: VMX: remove setting of shadow_base_ptes for EPT Marcelo Tosatti ` (4 more replies) 3 siblings, 5 replies; 12+ messages in thread From: Marcelo Tosatti @ 2010-10-22 16:18 UTC (permalink / raw) To: kvm; +Cc: avi, aarcange v2: - remove kvm_mmu_set_base_ptes - fix mmu_set_spte overwrite thinko ^ permalink raw reply [flat|nested] 12+ messages in thread
* [patch 1/4] KVM: VMX: remove setting of shadow_base_ptes for EPT 2010-10-22 16:18 ` [patch 0/4] support read-only mappings (v2) Marcelo Tosatti @ 2010-10-22 16:18 ` Marcelo Tosatti 2010-10-22 16:18 ` [patch 2/4] KVM: MMU: remove kvm_mmu_set_base_ptes Marcelo Tosatti ` (3 subsequent siblings) 4 siblings, 0 replies; 12+ messages in thread From: Marcelo Tosatti @ 2010-10-22 16:18 UTC (permalink / raw) To: kvm; +Cc: avi, aarcange, Marcelo Tosatti [-- Attachment #1: vmx-remove-base-ptes --] [-- Type: text/plain, Size: 1145 bytes --] The EPT present/writable bits use the same position as normal pagetable bits. Since direct_map passes ACC_ALL to mmu_set_spte, thus always setting the writable bit on sptes, use the generic PT_PRESENT shadow_base_pte. Also pass present/writable error code information from EPT violation to generic pagefault handler. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Index: kvm/arch/x86/kvm/vmx.c =================================================================== --- kvm.orig/arch/x86/kvm/vmx.c +++ kvm/arch/x86/kvm/vmx.c @@ -3477,7 +3477,7 @@ static int handle_ept_violation(struct k gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); trace_kvm_page_fault(gpa, exit_qualification); - return kvm_mmu_page_fault(vcpu, gpa & PAGE_MASK, 0); + return kvm_mmu_page_fault(vcpu, gpa, exit_qualification & 0x3); } static u64 ept_rsvd_mask(u64 spte, int level) @@ -4415,8 +4415,6 @@ static int __init vmx_init(void) if (enable_ept) { bypass_guest_pf = 0; - kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK | - VMX_EPT_WRITABLE_MASK); kvm_mmu_set_mask_ptes(0ull, 0ull, 0ull, 0ull, VMX_EPT_EXECUTABLE_MASK); kvm_enable_tdp(); ^ permalink raw reply [flat|nested] 12+ messages in thread
* [patch 2/4] KVM: MMU: remove kvm_mmu_set_base_ptes 2010-10-22 16:18 ` [patch 0/4] support read-only mappings (v2) Marcelo Tosatti 2010-10-22 16:18 ` [patch 1/4] KVM: VMX: remove setting of shadow_base_ptes for EPT Marcelo Tosatti @ 2010-10-22 16:18 ` Marcelo Tosatti 2010-10-22 16:18 ` [patch 3/4] KVM: MMU: flush TLBs on writable -> read-only spte overwrite Marcelo Tosatti ` (2 subsequent siblings) 4 siblings, 0 replies; 12+ messages in thread From: Marcelo Tosatti @ 2010-10-22 16:18 UTC (permalink / raw) To: kvm; +Cc: avi, aarcange, Marcelo Tosatti [-- Attachment #1: vmx-remove-set-base-ptes --] [-- Type: text/plain, Size: 1661 bytes --] Unused. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Index: kvm/arch/x86/kvm/mmu.c =================================================================== --- kvm.orig/arch/x86/kvm/mmu.c +++ kvm/arch/x86/kvm/mmu.c @@ -196,7 +196,6 @@ static struct percpu_counter kvm_total_u static u64 __read_mostly shadow_trap_nonpresent_pte; static u64 __read_mostly shadow_notrap_nonpresent_pte; -static u64 __read_mostly shadow_base_present_pte; static u64 __read_mostly shadow_nx_mask; static u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */ static u64 __read_mostly shadow_user_mask; @@ -215,12 +214,6 @@ void kvm_mmu_set_nonpresent_ptes(u64 tra } EXPORT_SYMBOL_GPL(kvm_mmu_set_nonpresent_ptes); -void kvm_mmu_set_base_ptes(u64 base_pte) -{ - shadow_base_present_pte = base_pte; -} -EXPORT_SYMBOL_GPL(kvm_mmu_set_base_ptes); - void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, u64 dirty_mask, u64 nx_mask, u64 x_mask) { @@ -1974,7 +1967,7 @@ static int set_spte(struct kvm_vcpu *vcp * whether the guest actually used the pte (in order to detect * demand paging). */ - spte = shadow_base_present_pte; + spte = PT_PRESENT_MASK; if (!speculative) spte |= shadow_accessed_mask; if (!dirty) Index: kvm/arch/x86/kvm/x86.c =================================================================== --- kvm.orig/arch/x86/kvm/x86.c +++ kvm/arch/x86/kvm/x86.c @@ -4688,7 +4688,6 @@ int kvm_arch_init(void *opaque) kvm_x86_ops = ops; kvm_mmu_set_nonpresent_ptes(0ull, 0ull); - kvm_mmu_set_base_ptes(PT_PRESENT_MASK); kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK, PT_DIRTY_MASK, PT64_NX_MASK, 0); ^ permalink raw reply [flat|nested] 12+ messages in thread
* [patch 3/4] KVM: MMU: flush TLBs on writable -> read-only spte overwrite 2010-10-22 16:18 ` [patch 0/4] support read-only mappings (v2) Marcelo Tosatti 2010-10-22 16:18 ` [patch 1/4] KVM: VMX: remove setting of shadow_base_ptes for EPT Marcelo Tosatti 2010-10-22 16:18 ` [patch 2/4] KVM: MMU: remove kvm_mmu_set_base_ptes Marcelo Tosatti @ 2010-10-22 16:18 ` Marcelo Tosatti 2010-10-22 16:18 ` [patch 4/4] KVM: propagate fault r/w information to gup(), allow read-only memory Marcelo Tosatti 2010-10-27 9:20 ` [patch 0/4] support read-only mappings (v2) Avi Kivity 4 siblings, 0 replies; 12+ messages in thread From: Marcelo Tosatti @ 2010-10-22 16:18 UTC (permalink / raw) To: kvm; +Cc: avi, aarcange, Marcelo Tosatti [-- Attachment #1: mmu-flush-tlb-on-overwrite --] [-- Type: text/plain, Size: 1074 bytes --] This can happen in the following scenario: vcpu0 vcpu1 read fault gup(.write=0) gup(.write=1) reuse swap cache, no COW set writable spte use writable spte set read-only spte Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Index: kvm/arch/x86/kvm/mmu.c =================================================================== --- kvm.orig/arch/x86/kvm/mmu.c +++ kvm/arch/x86/kvm/mmu.c @@ -2068,6 +2068,16 @@ static void mmu_set_spte(struct kvm_vcpu spte_to_pfn(*sptep), pfn); drop_spte(vcpu->kvm, sptep, shadow_trap_nonpresent_pte); kvm_flush_remote_tlbs(vcpu->kvm); + /* + * If we overwrite a writable spte with a read-only one, + * drop it and flush remote TLBs. Otherwise rmap_write_protect + * will find a read-only spte, even though the writable spte + * might be cached on a CPU's TLB. + */ + } else if (is_writable_pte(*sptep) && + (!(pte_access & ACC_WRITE_MASK) || !dirty)) { + drop_spte(vcpu->kvm, sptep, shadow_trap_nonpresent_pte); + kvm_flush_remote_tlbs(vcpu->kvm); } else was_rmapped = 1; } ^ permalink raw reply [flat|nested] 12+ messages in thread
* [patch 4/4] KVM: propagate fault r/w information to gup(), allow read-only memory 2010-10-22 16:18 ` [patch 0/4] support read-only mappings (v2) Marcelo Tosatti ` (2 preceding siblings ...) 2010-10-22 16:18 ` [patch 3/4] KVM: MMU: flush TLBs on writable -> read-only spte overwrite Marcelo Tosatti @ 2010-10-22 16:18 ` Marcelo Tosatti 2010-10-27 9:20 ` [patch 0/4] support read-only mappings (v2) Avi Kivity 4 siblings, 0 replies; 12+ messages in thread From: Marcelo Tosatti @ 2010-10-22 16:18 UTC (permalink / raw) To: kvm; +Cc: avi, aarcange, Marcelo Tosatti [-- Attachment #1: gfn-to-pfn-read-only --] [-- Type: text/plain, Size: 9862 bytes --] As suggested by Andrea, pass r/w error code to gup(), upgrading read fault to writable if host pte allows it. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Index: kvm/arch/x86/kvm/mmu.c =================================================================== --- kvm.orig/arch/x86/kvm/mmu.c +++ kvm/arch/x86/kvm/mmu.c @@ -2215,7 +2215,7 @@ static void direct_pte_prefetch(struct k } static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, - int level, gfn_t gfn, pfn_t pfn) + int map_writable, int level, gfn_t gfn, pfn_t pfn) { struct kvm_shadow_walk_iterator iterator; struct kvm_mmu_page *sp; @@ -2224,9 +2224,13 @@ static int __direct_map(struct kvm_vcpu for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) { if (iterator.level == level) { - mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, ACC_ALL, + unsigned pte_access = ACC_ALL; + + if (!map_writable) + pte_access &= ~ACC_WRITE_MASK; + mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, pte_access, 0, write, 1, &pt_write, - level, gfn, pfn, false, true); + level, gfn, pfn, false, map_writable); direct_pte_prefetch(vcpu, iterator.sptep); ++vcpu->stat.pf_fixed; break; @@ -2287,6 +2291,7 @@ static int nonpaging_map(struct kvm_vcpu int level; pfn_t pfn; unsigned long mmu_seq; + bool map_writable; level = mapping_level(vcpu, gfn); @@ -2301,7 +2306,7 @@ static int nonpaging_map(struct kvm_vcpu mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb(); - pfn = gfn_to_pfn(vcpu->kvm, gfn); + pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write, &map_writable); /* mmio */ if (is_error_pfn(pfn)) @@ -2311,7 +2316,7 @@ static int nonpaging_map(struct kvm_vcpu if (mmu_notifier_retry(vcpu, mmu_seq)) goto out_unlock; kvm_mmu_free_some_pages(vcpu); - r = __direct_map(vcpu, v, write, level, gfn, pfn); + r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn); spin_unlock(&vcpu->kvm->mmu_lock); @@ -2609,11 +2614,11 @@ static bool can_do_async_pf(struct kvm_v } static bool try_async_pf(struct kvm_vcpu *vcpu, bool no_apf, gfn_t gfn, - gva_t gva, pfn_t *pfn) + gva_t gva, pfn_t *pfn, bool write, bool *writable) { bool async; - *pfn = gfn_to_pfn_async(vcpu->kvm, gfn, &async); + *pfn = gfn_to_pfn_async(vcpu->kvm, gfn, &async, write, writable); if (!async) return false; /* *pfn has correct page already */ @@ -2630,7 +2635,7 @@ static bool try_async_pf(struct kvm_vcpu return true; } - *pfn = gfn_to_pfn(vcpu->kvm, gfn); + *pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write, writable); return false; } @@ -2643,6 +2648,8 @@ static int tdp_page_fault(struct kvm_vcp int level; gfn_t gfn = gpa >> PAGE_SHIFT; unsigned long mmu_seq; + int write = error_code & PFERR_WRITE_MASK; + bool map_writable; ASSERT(vcpu); ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa)); @@ -2658,7 +2665,7 @@ static int tdp_page_fault(struct kvm_vcp mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb(); - if (try_async_pf(vcpu, no_apf, gfn, gpa, &pfn)) + if (try_async_pf(vcpu, no_apf, gfn, gpa, &pfn, write, &map_writable)) return 0; /* mmio */ @@ -2668,7 +2675,7 @@ static int tdp_page_fault(struct kvm_vcp if (mmu_notifier_retry(vcpu, mmu_seq)) goto out_unlock; kvm_mmu_free_some_pages(vcpu); - r = __direct_map(vcpu, gpa, error_code & PFERR_WRITE_MASK, + r = __direct_map(vcpu, gpa, write, map_writable, level, gfn, pfn); spin_unlock(&vcpu->kvm->mmu_lock); Index: kvm/arch/x86/kvm/paging_tmpl.h =================================================================== --- kvm.orig/arch/x86/kvm/paging_tmpl.h +++ kvm/arch/x86/kvm/paging_tmpl.h @@ -427,7 +427,7 @@ static void FNAME(pte_prefetch)(struct k static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, struct guest_walker *gw, int user_fault, int write_fault, int hlevel, - int *ptwrite, pfn_t pfn) + int *ptwrite, pfn_t pfn, bool map_writable) { unsigned access = gw->pt_access; struct kvm_mmu_page *sp = NULL; @@ -501,7 +501,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu mmu_set_spte(vcpu, it.sptep, access, gw->pte_access & access, user_fault, write_fault, dirty, ptwrite, it.level, - gw->gfn, pfn, false, true); + gw->gfn, pfn, false, map_writable); FNAME(pte_prefetch)(vcpu, gw, it.sptep); return it.sptep; @@ -539,6 +539,7 @@ static int FNAME(page_fault)(struct kvm_ pfn_t pfn; int level = PT_PAGE_TABLE_LEVEL; unsigned long mmu_seq; + bool map_writable; pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code); @@ -569,13 +570,17 @@ static int FNAME(page_fault)(struct kvm_ mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb(); - if (try_async_pf(vcpu, no_apf, walker.gfn, addr, &pfn)) + if (try_async_pf(vcpu, no_apf, walker.gfn, addr, &pfn, write_fault, + &map_writable)) return 0; /* mmio */ if (is_error_pfn(pfn)) return kvm_handle_bad_page(vcpu->kvm, walker.gfn, pfn); + if (!map_writable) + walker.pte_access &= ~ACC_WRITE_MASK; + spin_lock(&vcpu->kvm->mmu_lock); if (mmu_notifier_retry(vcpu, mmu_seq)) goto out_unlock; @@ -583,7 +588,7 @@ static int FNAME(page_fault)(struct kvm_ trace_kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT); kvm_mmu_free_some_pages(vcpu); sptep = FNAME(fetch)(vcpu, addr, &walker, user_fault, write_fault, - level, &write_pt, pfn); + level, &write_pt, pfn, map_writable); (void)sptep; pgprintk("%s: shadow pte %p %llx ptwrite %d\n", __func__, sptep, *sptep, write_pt); Index: kvm/include/linux/kvm_host.h =================================================================== --- kvm.orig/include/linux/kvm_host.h +++ kvm/include/linux/kvm_host.h @@ -334,8 +334,11 @@ void kvm_set_page_accessed(struct page * pfn_t hva_to_pfn_atomic(struct kvm *kvm, unsigned long addr); pfn_t gfn_to_pfn_atomic(struct kvm *kvm, gfn_t gfn); -pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async); +pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async, + bool write_fault, bool *writable); pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn); +pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, + bool *writable); pfn_t gfn_to_pfn_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn); int memslot_id(struct kvm *kvm, gfn_t gfn); Index: kvm/virt/kvm/kvm_main.c =================================================================== --- kvm.orig/virt/kvm/kvm_main.c +++ kvm/virt/kvm/kvm_main.c @@ -959,7 +959,7 @@ static pfn_t get_fault_pfn(void) } static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr, bool atomic, - bool *async) + bool *async, bool write_fault, bool *writable) { struct page *page[1]; int npages = 0; @@ -968,19 +968,42 @@ static pfn_t hva_to_pfn(struct kvm *kvm, /* we can do it either atomically or asynchronously, not both */ BUG_ON(atomic && async); + BUG_ON(!write_fault && !writable); + + if (writable) + *writable = true; + if (atomic || async) npages = __get_user_pages_fast(addr, 1, 1, page); if (unlikely(npages != 1) && !atomic) { might_sleep(); + if (writable) + *writable = write_fault; + if (async) { down_read(¤t->mm->mmap_sem); npages = get_user_pages_noio(current, current->mm, - addr, 1, 1, 0, page, NULL); + addr, 1, write_fault, 0, + page, NULL); up_read(¤t->mm->mmap_sem); } else - npages = get_user_pages_fast(addr, 1, 1, page); + npages = get_user_pages_fast(addr, 1, write_fault, + page); + + /* map read fault as writable if possible */ + if (unlikely(!write_fault) && npages == 1) { + struct page *wpage[1]; + + npages = __get_user_pages_fast(addr, 1, 1, wpage); + if (npages == 1) { + *writable = true; + put_page(page[0]); + page[0] = wpage[0]; + } + npages = 1; + } } if (unlikely(npages != 1)) { @@ -1018,11 +1041,12 @@ static pfn_t hva_to_pfn(struct kvm *kvm, pfn_t hva_to_pfn_atomic(struct kvm *kvm, unsigned long addr) { - return hva_to_pfn(kvm, addr, true, NULL); + return hva_to_pfn(kvm, addr, true, NULL, true, NULL); } EXPORT_SYMBOL_GPL(hva_to_pfn_atomic); -static pfn_t __gfn_to_pfn(struct kvm *kvm, gfn_t gfn, bool atomic, bool *async) +static pfn_t __gfn_to_pfn(struct kvm *kvm, gfn_t gfn, bool atomic, bool *async, + bool write_fault, bool *writable) { unsigned long addr; @@ -1035,32 +1059,40 @@ static pfn_t __gfn_to_pfn(struct kvm *kv return page_to_pfn(bad_page); } - return hva_to_pfn(kvm, addr, atomic, async); + return hva_to_pfn(kvm, addr, atomic, async, write_fault, writable); } pfn_t gfn_to_pfn_atomic(struct kvm *kvm, gfn_t gfn) { - return __gfn_to_pfn(kvm, gfn, true, NULL); + return __gfn_to_pfn(kvm, gfn, true, NULL, true, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn_atomic); -pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async) +pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async, + bool write_fault, bool *writable) { - return __gfn_to_pfn(kvm, gfn, false, async); + return __gfn_to_pfn(kvm, gfn, false, async, write_fault, writable); } EXPORT_SYMBOL_GPL(gfn_to_pfn_async); pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) { - return __gfn_to_pfn(kvm, gfn, false, NULL); + return __gfn_to_pfn(kvm, gfn, false, NULL, true, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn); +pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, + bool *writable) +{ + return __gfn_to_pfn(kvm, gfn, false, NULL, write_fault, writable); +} +EXPORT_SYMBOL_GPL(gfn_to_pfn_prot); + pfn_t gfn_to_pfn_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn) { unsigned long addr = gfn_to_hva_memslot(slot, gfn); - return hva_to_pfn(kvm, addr, false, NULL); + return hva_to_pfn(kvm, addr, false, NULL, true, NULL); } int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, struct page **pages, ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch 0/4] support read-only mappings (v2) 2010-10-22 16:18 ` [patch 0/4] support read-only mappings (v2) Marcelo Tosatti ` (3 preceding siblings ...) 2010-10-22 16:18 ` [patch 4/4] KVM: propagate fault r/w information to gup(), allow read-only memory Marcelo Tosatti @ 2010-10-27 9:20 ` Avi Kivity 4 siblings, 0 replies; 12+ messages in thread From: Avi Kivity @ 2010-10-27 9:20 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: kvm, aarcange On 10/22/2010 06:18 PM, Marcelo Tosatti wrote: > v2: > - remove kvm_mmu_set_base_ptes > - fix mmu_set_spte overwrite thinko > > Applied, thanks. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2010-10-27 9:20 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-10-19 16:26 [patch 0/3] [RFC] support read-only mappings Marcelo Tosatti 2010-10-19 16:26 ` [patch 1/3] KVM: VMX: remove setting of shadow_base_ptes for EPT Marcelo Tosatti 2010-10-20 10:24 ` Avi Kivity 2010-10-19 16:26 ` [patch 2/3] KVM: MMU: flush TLBs on writable -> read-only spte overwrite Marcelo Tosatti 2010-10-19 16:26 ` [patch 3/3] KVM: propagate fault r/w information to gup(), allow read-only mappings Marcelo Tosatti 2010-10-20 10:36 ` Avi Kivity 2010-10-22 16:18 ` [patch 0/4] support read-only mappings (v2) Marcelo Tosatti 2010-10-22 16:18 ` [patch 1/4] KVM: VMX: remove setting of shadow_base_ptes for EPT Marcelo Tosatti 2010-10-22 16:18 ` [patch 2/4] KVM: MMU: remove kvm_mmu_set_base_ptes Marcelo Tosatti 2010-10-22 16:18 ` [patch 3/4] KVM: MMU: flush TLBs on writable -> read-only spte overwrite Marcelo Tosatti 2010-10-22 16:18 ` [patch 4/4] KVM: propagate fault r/w information to gup(), allow read-only memory Marcelo Tosatti 2010-10-27 9:20 ` [patch 0/4] support read-only mappings (v2) Avi Kivity
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox