* [PATCH] Fix SMP shadow instantiation race @ 2007-12-10 16:19 Marcelo Tosatti 2007-12-10 17:07 ` Avi Kivity 0 siblings, 1 reply; 7+ messages in thread From: Marcelo Tosatti @ 2007-12-10 16:19 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel There is a race where VCPU0 is shadowing a pagetable entry while VCPU1 is updating it, which results in a stale shadow copy. Fix that by comparing the contents of the cached guest pte with the current guest pte after write-protecting the guest pagetable. Attached program kvm_shadow_race.c demonstrates the problem. Signed-off-by: Marcelo Tosatti <mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h index 72d4816..4fece01 100644 --- a/drivers/kvm/paging_tmpl.h +++ b/drivers/kvm/paging_tmpl.h @@ -66,6 +66,7 @@ struct guest_walker { int level; gfn_t table_gfn[PT_MAX_FULL_LEVELS]; pt_element_t pte; + gpa_t pte_gpa; unsigned pt_access; unsigned pte_access; gfn_t gfn; @@ -212,6 +213,7 @@ walk: } walker->pte = pte; + walker->pte_gpa = pte_gpa; walker->pt_access = pt_access; walker->pte_access = pte_access; pgprintk("%s: pte %llx pte_access %x pt_access %x\n", @@ -267,6 +269,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, int level; u64 *shadow_ent; unsigned access = walker->pt_access; + pt_element_t curr_pte; if (!is_present_pte(walker->pte)) return NULL; @@ -316,6 +319,11 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, *shadow_ent = shadow_pte; } + kvm_read_guest(vcpu->kvm, walker->pte_gpa, &curr_pte, sizeof(curr_pte)); + + if (curr_pte != walker->pte) + return 0; + mmu_set_spte(vcpu, shadow_ent, access, walker->pte_access & access, user_fault, write_fault, walker->pte & PT_DIRTY_MASK, ptwrite, walker->gfn); @@ -382,10 +390,11 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, /* * mmio: emulate if accessible, otherwise its a guest fault. */ - if (is_io_pte(*shadow_pte)) + if (shadow_pte && is_io_pte(*shadow_pte)) return 1; - ++vcpu->stat.pf_fixed; + if (shadow_pte) + ++vcpu->stat.pf_fixed; kvm_mmu_audit(vcpu, "post page fault (fixed)"); return write_pt; ------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] Fix SMP shadow instantiation race 2007-12-10 16:19 [PATCH] Fix SMP shadow instantiation race Marcelo Tosatti @ 2007-12-10 17:07 ` Avi Kivity [not found] ` <475D726A.2040901-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Avi Kivity @ 2007-12-10 17:07 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: kvm-devel Marcelo Tosatti wrote: > There is a race where VCPU0 is shadowing a pagetable entry while VCPU1 > is updating it, which results in a stale shadow copy. > > Fix that by comparing the contents of the cached guest pte with the > current guest pte after write-protecting the guest pagetable. > > Attached program kvm_shadow_race.c demonstrates the problem. > > Where is it? > Signed-off-by: Marcelo Tosatti <mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > > > diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h > index 72d4816..4fece01 100644 > --- a/drivers/kvm/paging_tmpl.h > +++ b/drivers/kvm/paging_tmpl.h > @@ -66,6 +66,7 @@ struct guest_walker { > int level; > gfn_t table_gfn[PT_MAX_FULL_LEVELS]; > pt_element_t pte; > + gpa_t pte_gpa; > I think this needs to be an array like table_gfn[]. The guest may play with the pde (and upper entries) as well as the pte. > > + kvm_read_guest(vcpu->kvm, walker->pte_gpa, &curr_pte, sizeof(curr_pte)); > + > + if (curr_pte != walker->pte) > + return 0; > + > 'return NULL' It would also be preferable to read the pte only if we shadowed the page just now. Perhaps pass the pte and the index to kvm_mmu_get_page() which would use them as a guard when the page is being shadowed: if (lookup page succeeds) return it shadow page write protect it if (guard check succeeds) return it else return NULL or perhaps have kvm_mmu_get_page() return an additional bool signifying it is a new page. but this is ugly. > > - ++vcpu->stat.pf_fixed; > + if (shadow_pte) > + ++vcpu->stat.pf_fixed; > This is a very rare case; it isn't worth being so accurate maintaining the statistics. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <475D726A.2040901-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH] Fix SMP shadow instantiation race [not found] ` <475D726A.2040901-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2007-12-10 19:12 ` Marcelo Tosatti 2007-12-10 21:27 ` Avi Kivity 2007-12-12 0:12 ` Marcelo Tosatti 1 sibling, 1 reply; 7+ messages in thread From: Marcelo Tosatti @ 2007-12-10 19:12 UTC (permalink / raw) To: Avi Kivity; +Cc: Marcelo Tosatti, kvm-devel [-- Attachment #1: Type: text/plain, Size: 2176 bytes --] On Mon, Dec 10, 2007 at 07:07:54PM +0200, Avi Kivity wrote: > Marcelo Tosatti wrote: > >There is a race where VCPU0 is shadowing a pagetable entry while VCPU1 > >is updating it, which results in a stale shadow copy. > > > >Fix that by comparing the contents of the cached guest pte with the > >current guest pte after write-protecting the guest pagetable. > > > >Attached program kvm_shadow_race.c demonstrates the problem. > > > > > > Where is it? Attached. > >Signed-off-by: Marcelo Tosatti <mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > > > > > >diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h > >index 72d4816..4fece01 100644 > >--- a/drivers/kvm/paging_tmpl.h > >+++ b/drivers/kvm/paging_tmpl.h > >@@ -66,6 +66,7 @@ struct guest_walker { > > int level; > > gfn_t table_gfn[PT_MAX_FULL_LEVELS]; > > pt_element_t pte; > >+ gpa_t pte_gpa; > > > > I think this needs to be an array like table_gfn[]. The guest may play > with the pde (and upper entries) as well as the pte. I was working under the assumption that the only significant bits of upper entries (WRITEABLE and PRESENT) that can be changed by the guest must be reflected first in the lower level pte's. Isnt that a fair assumption to make? > >+ kvm_read_guest(vcpu->kvm, walker->pte_gpa, &curr_pte, > >sizeof(curr_pte)); > >+ > >+ if (curr_pte != walker->pte) > >+ return 0; > >+ > > > > 'return NULL' > > It would also be preferable to read the pte only if we shadowed the page > just now. Perhaps pass the pte and the index to kvm_mmu_get_page() > which would use them as a guard when the page is being shadowed: > > if (lookup page succeeds) > return it > shadow page > write protect it > if (guard check succeeds) > return it > else > return NULL > > or perhaps have kvm_mmu_get_page() return an additional bool signifying > it is a new page. but this is ugly. > > > > >- ++vcpu->stat.pf_fixed; > >+ if (shadow_pte) > >+ ++vcpu->stat.pf_fixed; > > > > This is a very rare case; it isn't worth being so accurate maintaining > the statistics. > > -- > error compiling committee.c: too many arguments to function [-- Attachment #2: kvm_shadow_race.c --] [-- Type: text/plain, Size: 1502 bytes --] #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <stdlib.h> #include <string.h> #include <sys/mman.h> #include <sys/time.h> #include <pthread.h> struct thread_param { void *mmaped_region; int start; int len; }; void* do_read(void *arg) { char buf[4096]; void *pos, *end;; struct thread_param *t = (struct thread_param *)arg; sleep(1); end = t->mmaped_region + t->len; for (pos=t->mmaped_region; pos < end; pos += sizeof(buf)) memcpy(&buf, pos, sizeof(buf)); } #define FLEN 128*1024*1024 int main(void) { int fd, err, i; char buf[4096]; void *mmaped_region; fd = open("/tmp/largefile", O_RDWR|O_CREAT); if (!fd) { perror("failed to create /tmp/largefile"); exit(0); } chmod("/tmp/largefile", S_IRUSR|S_IWUSR); memset(buf, 0xf, sizeof(buf)); for (i=0; i<FLEN; i+=sizeof buf) { err = write(fd, buf, sizeof(buf)); if (err < 0) { perror("write"); exit(0); } } mmaped_region = mmap(0, FLEN, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); if (mmaped_region == MAP_FAILED) { perror("mmap"); exit(0); } for (i = 0; i < FLEN; i += FLEN/32) { pthread_t thread; struct thread_param *t = malloc(sizeof(struct thread_param)); t->mmaped_region = mmaped_region; t->start = i; t->len = FLEN/32; if (pthread_create(&thread, NULL, do_read, t)) perror("pthread_create"); } sleep(1); i = mprotect(mmaped_region, FLEN, PROT_READ); if (i < 0) perror("mprotect"); return 0; } [-- Attachment #3: Type: text/plain, Size: 277 bytes --] ------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php [-- Attachment #4: Type: text/plain, Size: 186 bytes --] _______________________________________________ kvm-devel mailing list kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org https://lists.sourceforge.net/lists/listinfo/kvm-devel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Fix SMP shadow instantiation race 2007-12-10 19:12 ` Marcelo Tosatti @ 2007-12-10 21:27 ` Avi Kivity [not found] ` <475DAF51.8060804-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Avi Kivity @ 2007-12-10 21:27 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: kvm-devel Marcelo Tosatti wrote: > On Mon, Dec 10, 2007 at 07:07:54PM +0200, Avi Kivity wrote: > >> Marcelo Tosatti wrote: >> >>> There is a race where VCPU0 is shadowing a pagetable entry while VCPU1 >>> is updating it, which results in a stale shadow copy. >>> >>> Fix that by comparing the contents of the cached guest pte with the >>> current guest pte after write-protecting the guest pagetable. >>> >>> Attached program kvm_shadow_race.c demonstrates the problem. >>> >>> >>> >> Where is it? >> > > Attached. > > Can you explain what it does? I get the same results on both host and guest (successful completion). >>> diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h >>> index 72d4816..4fece01 100644 >>> --- a/drivers/kvm/paging_tmpl.h >>> +++ b/drivers/kvm/paging_tmpl.h >>> @@ -66,6 +66,7 @@ struct guest_walker { >>> int level; >>> gfn_t table_gfn[PT_MAX_FULL_LEVELS]; >>> pt_element_t pte; >>> + gpa_t pte_gpa; >>> >>> >> I think this needs to be an array like table_gfn[]. The guest may play >> with the pde (and upper entries) as well as the pte. >> > > I was working under the assumption that the only significant bits of > upper entries (WRITEABLE and PRESENT) that can be changed by the guest > must be reflected first in the lower level pte's. > > Isnt that a fair assumption to make? > > The other bits (including the physical addresses) may change too. There is no requirement that the changes to pde write/present bits be reflected on pte write/present bits. Consider a unix kernel implementing fork() by write-protecting the pud tables. It can write protect the entire user address space by clearing the write bit on the first 256 pgd entries. (I don't think Linux does that; maybe that's a worthwhile optimization) -- Any sufficiently difficult bug is indistinguishable from a feature. ------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <475DAF51.8060804-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH] Fix SMP shadow instantiation race [not found] ` <475DAF51.8060804-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2007-12-10 22:22 ` Marcelo Tosatti 0 siblings, 0 replies; 7+ messages in thread From: Marcelo Tosatti @ 2007-12-10 22:22 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel On Mon, Dec 10, 2007 at 11:27:45PM +0200, Avi Kivity wrote: > Marcelo Tosatti wrote: > > On Mon, Dec 10, 2007 at 07:07:54PM +0200, Avi Kivity wrote: > > > >> Marcelo Tosatti wrote: > >> > >>> There is a race where VCPU0 is shadowing a pagetable entry while VCPU1 > >>> is updating it, which results in a stale shadow copy. > >>> > >>> Fix that by comparing the contents of the cached guest pte with the > >>> current guest pte after write-protecting the guest pagetable. > >>> > >>> Attached program kvm_shadow_race.c demonstrates the problem. > >>> > >>> > >>> > >> Where is it? > >> > > > > Attached. > > > > > > Can you explain what it does? I get the same results on both host and > guest (successful completion). What it does is: 1) mmap 128MB file as (PROT_READ|PROT_WRITE) 2) starts 32 threads to read 128/32 bytes each 3) calls mprotect(PROT_READ) on that region 2) and 3) run simultaneously. I added a printk inside the fault handler to read and compare the original and the just shadowed pte, and you can sometimes see that they differ (it takes about 10-20 runs of the program to hit the race on a 4-way host with 4-way guest), in that the shadowed PTE has the writeable bit set but the original doesnt. The program will always succeed even if the race happens. It could be a similar scenario such as the one you mentioned earlier: guest kernel is nukeing a pte to reclaim a page while another CPU is instantiating that shadow pte, in which case there would be a shadow page mapping for a now freed page, resulting in data corruption. The kernel sets the PTE to zero and then flushes the TLB to do that, but for KVM the TLB flush has no effect. > >>> diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h > >>> index 72d4816..4fece01 100644 > >>> --- a/drivers/kvm/paging_tmpl.h > >>> +++ b/drivers/kvm/paging_tmpl.h > >>> @@ -66,6 +66,7 @@ struct guest_walker { > >>> int level; > >>> gfn_t table_gfn[PT_MAX_FULL_LEVELS]; > >>> pt_element_t pte; > >>> + gpa_t pte_gpa; > >>> > >>> > >> I think this needs to be an array like table_gfn[]. The guest may play > >> with the pde (and upper entries) as well as the pte. > >> > > > > I was working under the assumption that the only significant bits of > > upper entries (WRITEABLE and PRESENT) that can be changed by the guest > > must be reflected first in the lower level pte's. > > > > Isnt that a fair assumption to make? > > > > > > The other bits (including the physical addresses) may change too. There > is no requirement that the changes to pde write/present bits be > reflected on pte write/present bits. > > Consider a unix kernel implementing fork() by write-protecting the pud > tables. It can write protect the entire user address space by clearing > the write bit on the first 256 pgd entries. > > (I don't think Linux does that; maybe that's a worthwhile optimization) OK, I'll do as you suggest comparing at mmu_get_page() just after shadowing (which also gets rid of the copy/test if the page is already shadowed). Thanks ------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Fix SMP shadow instantiation race [not found] ` <475D726A.2040901-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 2007-12-10 19:12 ` Marcelo Tosatti @ 2007-12-12 0:12 ` Marcelo Tosatti 2007-12-13 8:37 ` Avi Kivity 1 sibling, 1 reply; 7+ messages in thread From: Marcelo Tosatti @ 2007-12-12 0:12 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel On Mon, Dec 10, 2007 at 07:07:54PM +0200, Avi Kivity wrote: > Marcelo Tosatti wrote: > >There is a race where VCPU0 is shadowing a pagetable entry while VCPU1 > >is updating it, which results in a stale shadow copy. > > > >Fix that by comparing the contents of the cached guest pte with the > >current guest pte after write-protecting the guest pagetable. > > > >Attached program kvm_shadow_race.c demonstrates the problem. > > > > > > Where is it? > > >Signed-off-by: Marcelo Tosatti <mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > > > > > >diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h > >index 72d4816..4fece01 100644 > >--- a/drivers/kvm/paging_tmpl.h > >+++ b/drivers/kvm/paging_tmpl.h > >@@ -66,6 +66,7 @@ struct guest_walker { > > int level; > > gfn_t table_gfn[PT_MAX_FULL_LEVELS]; > > pt_element_t pte; > >+ gpa_t pte_gpa; > > > > I think this needs to be an array like table_gfn[]. The guest may play > with the pde (and upper entries) as well as the pte. Alright, then it also needs to record the cached pte's for upper levels. > > > >+ kvm_read_guest(vcpu->kvm, walker->pte_gpa, &curr_pte, > >sizeof(curr_pte)); > >+ > >+ if (curr_pte != walker->pte) > >+ return 0; > >+ > > > > 'return NULL' > > It would also be preferable to read the pte only if we shadowed the page > just now. Perhaps pass the pte and the index to kvm_mmu_get_page() > which would use them as a guard when the page is being shadowed: > > if (lookup page succeeds) > return it > shadow page > write protect it > if (guard check succeeds) > return it > else > return NULL > > or perhaps have kvm_mmu_get_page() return an additional bool signifying > it is a new page. but this is ugly. Given that it would be necessary to determine the size of pt_element_t inside kvm_mmu_get_page(), I prefer the latter: ------------- There is a race where VCPU0 is shadowing a pagetable entry while VCPU1 is updating it, which results in a stale shadow copy. Fix that by comparing the contents of the cached guest pte with the current guest pte after write-protecting the guest pagetable. Signed-off-by: Marcelo Tosatti <mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c index ba71e8d..92ac0d1 100644 --- a/drivers/kvm/mmu.c +++ b/drivers/kvm/mmu.c @@ -681,7 +681,8 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, unsigned level, int metaphysical, unsigned access, - u64 *parent_pte) + u64 *parent_pte, + bool *new_page) { union kvm_mmu_page_role role; unsigned index; @@ -720,6 +721,8 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, vcpu->mmu.prefetch_page(vcpu, sp); if (!metaphysical) rmap_write_protect(vcpu->kvm, gfn); + if (new_page) + *new_page = 1; return sp; } @@ -993,7 +996,8 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn) >> PAGE_SHIFT; new_table = kvm_mmu_get_page(vcpu, pseudo_gfn, v, level - 1, - 1, ACC_ALL, &table[index]); + 1, ACC_ALL, &table[index], + NULL); if (!new_table) { pgprintk("nonpaging_map: ENOMEM\n"); return -ENOMEM; @@ -1059,7 +1063,7 @@ static void mmu_alloc_roots(struct kvm_vcpu *vcpu) ASSERT(!VALID_PAGE(root)); sp = kvm_mmu_get_page(vcpu, root_gfn, 0, - PT64_ROOT_LEVEL, 0, ACC_ALL, NULL); + PT64_ROOT_LEVEL, 0, ACC_ALL, NULL, NULL); root = __pa(sp->spt); ++sp->root_count; vcpu->mmu.root_hpa = root; @@ -1080,7 +1084,7 @@ static void mmu_alloc_roots(struct kvm_vcpu *vcpu) root_gfn = 0; sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, PT32_ROOT_LEVEL, !is_paging(vcpu), - ACC_ALL, NULL); + ACC_ALL, NULL, NULL); root = __pa(sp->spt); ++sp->root_count; vcpu->mmu.pae_root[i] = root | PT_PRESENT_MASK; diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h index 72d4816..c3ddcf2 100644 --- a/drivers/kvm/paging_tmpl.h +++ b/drivers/kvm/paging_tmpl.h @@ -65,7 +65,8 @@ struct guest_walker { int level; gfn_t table_gfn[PT_MAX_FULL_LEVELS]; - pt_element_t pte; + pt_element_t ptes[PT_MAX_FULL_LEVELS]; + gpa_t pte_gpa[PT_MAX_FULL_LEVELS]; unsigned pt_access; unsigned pte_access; gfn_t gfn; @@ -150,6 +151,7 @@ walk: pte_gpa = gfn_to_gpa(table_gfn); pte_gpa += index * sizeof(pt_element_t); walker->table_gfn[walker->level - 1] = table_gfn; + walker->pte_gpa[walker->level - 1] = pte_gpa; pgprintk("%s: table_gfn[%d] %lx\n", __FUNCTION__, walker->level - 1, table_gfn); @@ -180,6 +182,8 @@ walk: pte_access = pt_access & FNAME(gpte_access)(vcpu, pte); + walker->ptes[walker->level - 1] = pte; + if (walker->level == PT_PAGE_TABLE_LEVEL) { walker->gfn = gpte_to_gfn(pte); break; @@ -209,9 +213,9 @@ walk: goto walk; pte |= PT_DIRTY_MASK; kvm_mmu_pte_write(vcpu, pte_gpa, (u8 *)&pte, sizeof(pte)); + walker->ptes[walker->level - 1] = pte; } - walker->pte = pte; walker->pt_access = pt_access; walker->pte_access = pte_access; pgprintk("%s: pte %llx pte_access %x pt_access %x\n", @@ -268,7 +272,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, u64 *shadow_ent; unsigned access = walker->pt_access; - if (!is_present_pte(walker->pte)) + if (!is_present_pte(walker->ptes[walker->level - 1])) return NULL; shadow_addr = vcpu->mmu.root_hpa; @@ -285,6 +289,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, u64 shadow_pte; int metaphysical; gfn_t table_gfn; + bool new_page = 0; shadow_ent = ((u64 *)__va(shadow_addr)) + index; if (is_shadow_present_pte(*shadow_ent)) { @@ -300,16 +305,23 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, if (level - 1 == PT_PAGE_TABLE_LEVEL && walker->level == PT_DIRECTORY_LEVEL) { metaphysical = 1; - if (!is_dirty_pte(walker->pte)) + if (!is_dirty_pte(walker->ptes[level - 1])) access &= ~ACC_WRITE_MASK; - table_gfn = gpte_to_gfn(walker->pte); + table_gfn = gpte_to_gfn(walker->ptes[level - 1]); } else { metaphysical = 0; table_gfn = walker->table_gfn[level - 2]; } shadow_page = kvm_mmu_get_page(vcpu, table_gfn, addr, level-1, metaphysical, access, - shadow_ent); + shadow_ent, &new_page); + if (new_page && !metaphysical) { + pt_element_t curr_pte; + kvm_read_guest(vcpu->kvm, walker->pte_gpa[level - 2], + &curr_pte, sizeof(curr_pte)); + if (curr_pte != walker->ptes[level - 2]) + return NULL; + } shadow_addr = __pa(shadow_page->spt); shadow_pte = shadow_addr | PT_PRESENT_MASK | PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK; @@ -317,7 +329,8 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, } mmu_set_spte(vcpu, shadow_ent, access, walker->pte_access & access, - user_fault, write_fault, walker->pte & PT_DIRTY_MASK, + user_fault, write_fault, + walker->ptes[walker->level-1] & PT_DIRTY_MASK, ptwrite, walker->gfn); return shadow_ent; @@ -382,7 +395,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, /* * mmio: emulate if accessible, otherwise its a guest fault. */ - if (is_io_pte(*shadow_pte)) + if (shadow_pte && is_io_pte(*shadow_pte)) return 1; ++vcpu->stat.pf_fixed; ------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] Fix SMP shadow instantiation race 2007-12-12 0:12 ` Marcelo Tosatti @ 2007-12-13 8:37 ` Avi Kivity 0 siblings, 0 replies; 7+ messages in thread From: Avi Kivity @ 2007-12-13 8:37 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: kvm-devel Marcelo Tosatti wrote: > There is a race where VCPU0 is shadowing a pagetable entry while VCPU1 > is updating it, which results in a stale shadow copy. > > Fix that by comparing the contents of the cached guest pte with the > current guest pte after write-protecting the guest pagetable. > > Applied, thanks. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-12-13 8:37 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-10 16:19 [PATCH] Fix SMP shadow instantiation race Marcelo Tosatti
2007-12-10 17:07 ` Avi Kivity
[not found] ` <475D726A.2040901-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-10 19:12 ` Marcelo Tosatti
2007-12-10 21:27 ` Avi Kivity
[not found] ` <475DAF51.8060804-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-10 22:22 ` Marcelo Tosatti
2007-12-12 0:12 ` Marcelo Tosatti
2007-12-13 8:37 ` Avi Kivity
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox